[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPDyKFpRi8u3MPauT1hnYC1pW7L4kAohAZDsgS2pgQ=4_sjgNA@mail.gmail.com>
Date: Sat, 3 Jan 2026 12:12:51 +0100
From: Ulf Hansson <ulf.hansson@...aro.org>
To: Tabby Kitten <nyanpasu256@...il.com>
Cc: linux-mmc@...r.kernel.org, linux-kernel@...r.kernel.org,
Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: rtsx_pci_sdmmc aborts suspend when /sys/power/wakeup_count is enabled
+ Adrian
On Thu, 1 Jan 2026 at 05:58, Tabby Kitten <nyanpasu256@...il.com> wrote:
>
> Hi,
>
> It's been a few weeks since you looked into the bug. I think the merge window is over now, have you had the time to look into resolving this issue?
Yes, sorry for the delay.
See below for an attached patch. Please try it out and report back.
Kind regards
Uffe
>
> Tabby
>
> On Tue, Dec 9, 2025 at 7:09 AM Ulf Hansson <ulf.hansson@...aro.org> wrote:
>>
>> Hi,
>>
>> On Wed, 26 Nov 2025 at 10:08, Tabby Kitten <nyanpasu256@...il.com> wrote:
>> >
>> > On a PC with a Realtek PCI Express SD reader, when you sleep with
>> > `wakeup_count` active (eg. sleeping from KDE's lock screen), the MMC
>> > driver wakes up the system and aborts suspend.
>>
>> Okay, that's clearly a problem that needs to be fixed!
>>
>> >
>> > I've found a sleep failure bug in the rtsx_pci and mmc_core drivers.
>> > After userspace writes a number to `/sys/power/wakeup_count` (eg. KDE
>> > Plasma does it to distinguish user wakes from timers and Wake-on-LAN),
>> > if it attempts a mem suspend it will be aborted when
>> > rtsx_pci_runtime_resume() -> mmc_detect_change() emits a
>> > pm_wakeup_ws_event(). This breaks sleep on some hardware and desktop
>> > environments.
>> >
>> > The detailed description:
>> > The recently released Plasma 6.5.0 writes to `/sys/power/wakeup_count`
>> > before sleeping. On my computer this caused the sleep attempt to fail
>> > with dmesg error "PM: Some devices failed to suspend, or early wake
>> > event detected". I got this error on both Arch Linux and Fedora, and
>> > replicated it on Fedora with the mainline kernel COPR. KDE is tracking
>> > this error at https://bugs.kde.org/show_bug.cgi?id=510992, and have
>> > disabled writing to wakeup_count on Plasma 6.5.3 to work around this
>> > issue.
>> >
>> > I've written a standalone shell script to reproduce this sleep failure
>> > (save as badsleep.sh):
>> >
>> > #!/bin/bash
>> > read wakeup_count < /sys/power/wakeup_count
>> > if [[ $? -ne 0 ]]; then
>> > e=$?
>> > echo "Failed to open wakeup_count, suspend maybe already in progress"
>> > exit $e
>> > fi
>> > echo $wakeup_count > /sys/power/wakeup_count
>> > if [[ $? -ne 0 ]]; then
>> > e=$?
>> > echo "Failed to write wakeup_count, wakeup_count may have changed in between"
>> > exit $e
>> > fi
>> > echo mem > /sys/power/state
>> >
>> > Running `sudo ./badsleep.sh` reproduces failed sleeps on my computer.
>> > (sudo is needed to write to `/sys/power/wakeup_count` on Fedora.)
>> >
>> > * If I run the script unaltered, the screen turns off and on, and the
>> > terminal outputs
>> > `./badsleep.sh: line 14: echo: write error: Device or resource busy`
>> > indicating the mem sleep failed.
>> >
>> > * If I edit the script and comment out `echo $wakeup_count >
>> > /sys/power/wakeup_count`, the sleep succeeds, and waking the computer
>> > skips the lock screen and resumes where I left off.
>> >
>> > * If I run `sudo rmmod rtsx_pci_sdmmc` to disable the faulty module, the
>> > sleep succeeds, and waking the computer skips the lock screen and
>> > resumes where I left off.
>> >
>> > I think this problem happens in general when a driver spawns a wakeup
>> > event from its suspend callback. On my system, the driver in question
>> > lies in the MMC subsystem.
>> >
>> > ## Code debugging
>> >
>> > If I run `echo 1 > /sys/power/pm_debug_messages` to enable verbose
>> > logging, then attempt a failed sleep, I see output:
>> >
>> > PM: Wakeup pending, aborting suspend
>> > PM: active wakeup source: mmc0
>> > PM: suspend of devices aborted after 151.615 msecs
>> > PM: start suspend of devices aborted after 169.797 msecs
>> > PM: Some devices failed to suspend, or early wake event detected
>> >
>> > The "Wakeup pending, aborting suspend" message comes from function
>> > `pm_wakeup_pending()`. This function checks if event checks are enabled,
>> > and if some counters have changed aborts suspend and calls
>> > `pm_print_active_wakeup_sources()`, which prints `wakeup_sources`.
>> > Tracing the code that modifies `wakeup_sources`, I found that
>> > `pm_wakeup_ws_event()` would activate an event and
>> > `wakeup_source_register() → wakeup_source_add()` would add a new one.
>>
>> Thanks for all the details!
>>
>> >
>> > To find who changed wakeup events, I used my stacksnoop fork at
>> > https://github.com/nyanpasu64/bcc/blob/local/examples/tracing/stacksnoop
>> > .py to trace a failed suspend:
>> >
>> > nyanpasu64@...en ~/code/bcc (local)> sudo ./examples/tracing/stacksnoop.py pm_wakeup_ws_event wakeup_source_register
>> > TIME(s) FUNCTION
>> > 7.254676819:
>> > 0: ret_from_fork_asm [kernel]
>> > 1: ret_from_fork [kernel]
>> > 2: kthread [kernel]
>> > 3: worker_thread [kernel]
>> > 4: process_one_work [kernel]
>> > 5: async_run_entry_fn [kernel]
>> > 6: async_suspend [kernel]
>> > 7: device_suspend [kernel]
>> > 8: dpm_run_callback [kernel]
>> > 9: mmc_bus_suspend [mmc_core]
>> > 10: mmc_blk_suspend [mmc_block]
>> > 11: mmc_queue_suspend [mmc_block]
>> > 12: __mmc_claim_host [mmc_core]
>> > 13: __pm_runtime_resume [kernel]
>> > 14: rpm_resume [kernel]
>> > 15: rpm_resume [kernel]
>> > 16: rpm_callback [kernel]
>> > 17: __rpm_callback [kernel]
>> > 18: rtsx_pci_runtime_resume [rtsx_pci]
>> > 19: mmc_detect_change [mmc_core]
>> > 20: pm_wakeup_ws_event [kernel]
>> >
>> > On a previous kernel, lines 9-12 were replaced by a single call to
>> > `pci_pm_suspend`. I've posted my detailed debugging on the older kernel
>> > at https://bugs.kde.org/show_bug.cgi?id=510992#c26. There I found that
>> > `pci_pm_suspend()` wakes PCI(e) devices before sending them into a full
>> > sleep state, but in the process, `_mmc_detect_change()` will "Prevent
>> > system sleep for 5s to allow user space to consume the\n corresponding
>> > uevent"... which interrupts a system sleep in progress.
>> >
>> > On my current kernel, the same logic applies, but reading the source I
>> > can't tell where `__mmc_claim_host()` is actually calling
>> > `__pm_runtime_resume()`. Nonetheless the problem remains that
>> > `rpm_resume()` is called during system suspend, `mmc_detect_change()`
>> > wakes the system when called, and this will abort system sleep when
>> > `/sys/power/wakeup_count` is active.
>>
>> __mmc_claim_host() will call pm_runtime_get_sync() to runtime resume
>> the mmc host device.
>>
>> The mmc host device's parent (a pci device) will then be runtime
>> resumed too. That's the call to rtsx_pci_runtime_resume() we see
>> above.
>>
>> The problem is then that rtsx_pci_runtime_resume() invokes a callback
>> (->card_event())) back into the mmc host driver
>> (drivers/mmc/host/rtsx_pci_sdmmc.c), which ends up calling
>> mmc_detect_change() to try to detect whether a card have been
>> inserted/removed.
>>
>> >
>> > ## Next steps
>> >
>> > How would this problem be addressed? Off the top of my head, perhaps you
>> > could not call `__pm_runtime_resume()` on a SD card reader during the
>> > `device_suspend()` process, not call `pm_wakeup_ws_event()` when the SD
>> > card status changes, not call `pm_wakeup_ws_event()` *specifically*
>> > when system suspend is temporarily waking up a SD card reader, or
>> > disable pm_wakeup_ws_event() entirely during the suspend process (does
>> > this defeat the purpose of the function?).
>>
>> Let me think a bit on what makes the best sense here. I will get back
>> to you in a couple of days.
>>
>> >
>> > Are there other drivers which cause the same symptoms? I don't know. I
>> > asked on the KDE bug tracker for other users to attempt a failed sleep
>> > with `echo 1 > /sys/power/pm_debug_messages` active, to identify which
>> > driver broke suspend in their system; so far nobody has replied with
>> > logs.
>> >
>> > Given that this bug is related to `/sys/power/wakeup_count`
>> > (https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-power), I
>> > was considering CCing Rafael J. Wysocki <rafael@...nel.org> and
>> > linux-pm@...r.kernel.org, but have decided to only message the MMC
>> > maintainers for now. If necessary we may have to forward this message
>> > there to get their attention.
>> >
>> > ----
>> >
>> > System information:
>> >
>> > * I have an Intel NUC8i7BEH mini PC, with CPU 8 × Intel® Core™ i7-8559U
>> > CPU @ 2.70GHz.
>> >
>> > * uname -mi prints `x86_64 unknown`.
>> >
>> > * `lspci -nn` prints
>> > "6e:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader [10ec:522a] (rev 01)".
>> >
>> > * I am running kernel 6.18.0-0.rc7.357.vanilla.fc43.x86_64 from the Fedora COPRs
>> > (https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories).
>> >
>> > * dmesg at https://gist.github.com/nyanpasu64/ab5d3d1565aafe6c1c08cbcaf074e44a#file-dmesg-2025-11-25-txt
>> >
>> > * Fully resolved config at https://gist.github.com/nyanpasu64/ab5d3d1565aafe6c1c08cbcaf074e44a#file-config-6-18-0-0-rc7-357-vanilla-fc43-x86_64,
>> > source at https://download.copr.fedorainfracloud.org/results/@kernel-vanilla/mainline-wo-mergew/fedora-43-x86_64/09831015-mainline-womergew-releases/kernel-6.18.0-0.rc7.357.vanilla.fc43.src.rpm
>>
>> Kind regards
>> Uffe
From: Ulf Hansson <ulf.hansson@...aro.org>
Date: Sat, 3 Jan 2026 11:55:44 +0100
Subject: [PATCH] mmc: core: Avoid runtime PM of host in mmc_queue_suspend()
WIP
Signed-off-by: Ulf Hansson <ulf.hansson@...aro.org>
---
drivers/mmc/core/core.c | 18 ++++++++++++------
drivers/mmc/core/core.h | 11 ++++++++---
drivers/mmc/core/queue.c | 4 ++--
drivers/mmc/core/sdio_irq.c | 2 +-
4 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 860378bea557..c3923522833a 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -781,6 +781,7 @@ static inline void mmc_ctx_set_claimer(struct
mmc_host *host,
* @ctx: context that claims the host or NULL in which case the default
* context will be used
* @abort: whether or not the operation should be aborted
+ * @do_pm: whether to use runtime PM or not
*
* Claim a host for a set of operations. If @abort is non null and
* dereference a non-zero value then this will return prematurely with
@@ -788,7 +789,7 @@ static inline void mmc_ctx_set_claimer(struct
mmc_host *host,
* with the lock held otherwise.
*/
int __mmc_claim_host(struct mmc_host *host, struct mmc_ctx *ctx,
- atomic_t *abort)
+ atomic_t *abort, bool do_pm)
{
struct task_struct *task = ctx ? NULL : current;
DECLARE_WAITQUEUE(wait, current);
@@ -821,7 +822,7 @@ int __mmc_claim_host(struct mmc_host *host, struct
mmc_ctx *ctx,
spin_unlock_irqrestore(&host->lock, flags);
remove_wait_queue(&host->wq, &wait);
- if (pm)
+ if (do_pm && pm)
pm_runtime_get_sync(mmc_dev(host));
return stop;
@@ -829,13 +830,14 @@ int __mmc_claim_host(struct mmc_host *host,
struct mmc_ctx *ctx,
EXPORT_SYMBOL(__mmc_claim_host);
/**
- * mmc_release_host - release a host
+ * __mmc_release_host - release a host
* @host: mmc host to release
+ * @do_pm: whether to use runtime PM or not
*
* Release a MMC host, allowing others to claim the host
* for their operations.
*/
-void mmc_release_host(struct mmc_host *host)
+void __mmc_release_host(struct mmc_host *host, bool do_pm)
{
unsigned long flags;
@@ -851,6 +853,10 @@ void mmc_release_host(struct mmc_host *host)
host->claimer = NULL;
spin_unlock_irqrestore(&host->lock, flags);
wake_up(&host->wq);
+
+ if (!do_pm)
+ return;
+
pm_runtime_mark_last_busy(mmc_dev(host));
if (host->caps & MMC_CAP_SYNC_RUNTIME_PM)
pm_runtime_put_sync_suspend(mmc_dev(host));
@@ -858,7 +864,7 @@ void mmc_release_host(struct mmc_host *host)
pm_runtime_put_autosuspend(mmc_dev(host));
}
}
-EXPORT_SYMBOL(mmc_release_host);
+EXPORT_SYMBOL(__mmc_release_host);
/*
* This is a helper function, which fetches a runtime pm reference for the
@@ -867,7 +873,7 @@ EXPORT_SYMBOL(mmc_release_host);
void mmc_get_card(struct mmc_card *card, struct mmc_ctx *ctx)
{
pm_runtime_get_sync(&card->dev);
- __mmc_claim_host(card->host, ctx, NULL);
+ __mmc_claim_host(card->host, ctx, NULL, true);
}
EXPORT_SYMBOL(mmc_get_card);
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index a028b48be164..5979c90d3b09 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -135,8 +135,8 @@ unsigned int mmc_calc_max_discard(struct mmc_card *card);
int mmc_set_blocklen(struct mmc_card *card, unsigned int blocklen);
int __mmc_claim_host(struct mmc_host *host, struct mmc_ctx *ctx,
- atomic_t *abort);
-void mmc_release_host(struct mmc_host *host);
+ atomic_t *abort, bool do_pm);
+void __mmc_release_host(struct mmc_host *host, bool do_pm);
void mmc_get_card(struct mmc_card *card, struct mmc_ctx *ctx);
void mmc_put_card(struct mmc_card *card, struct mmc_ctx *ctx);
@@ -150,7 +150,12 @@ int mmc_card_alternative_gpt_sector(struct
mmc_card *card, sector_t *sector);
*/
static inline void mmc_claim_host(struct mmc_host *host)
{
- __mmc_claim_host(host, NULL, NULL);
+ __mmc_claim_host(host, NULL, NULL, true);
+}
+
+static inline void mmc_release_host(struct mmc_host *host)
+{
+ __mmc_release_host(host, true);
}
int mmc_cqe_start_req(struct mmc_host *host, struct mmc_request *mrq);
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 284856c8f655..76e83f49ff4e 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -477,8 +477,8 @@ void mmc_queue_suspend(struct mmc_queue *mq)
* The host remains claimed while there are outstanding requests, so
* simply claiming and releasing here ensures there are none.
*/
- mmc_claim_host(mq->card->host);
- mmc_release_host(mq->card->host);
+ __mmc_claim_host(mq->card->host, NULL, NULL, false);
+ __mmc_release_host(mq->card->host, false);
}
void mmc_queue_resume(struct mmc_queue *mq)
diff --git a/drivers/mmc/core/sdio_irq.c b/drivers/mmc/core/sdio_irq.c
index 2b24bdf38296..e5d4f8c634c8 100644
--- a/drivers/mmc/core/sdio_irq.c
+++ b/drivers/mmc/core/sdio_irq.c
@@ -172,7 +172,7 @@ static int sdio_irq_thread(void *_host)
* that doesn't require that lock to be held.
*/
ret = __mmc_claim_host(host, NULL,
- &host->sdio_irq_thread_abort);
+ &host->sdio_irq_thread_abort, true);
if (ret)
break;
ret = process_sdio_pending_irqs(host);
--
2.43.0
Powered by blists - more mailing lists