[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f91584df-eee3-4c26-b1da-a8049a4d77bd@intel.com>
Date: Mon, 5 Jan 2026 14:31:16 +0200
From: Adrian Hunter <adrian.hunter@...el.com>
To: Ulf Hansson <ulf.hansson@...aro.org>, Tabby Kitten
<nyanpasu256@...il.com>, <ricky_wu@...ltek.com>
CC: <linux-mmc@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: rtsx_pci_sdmmc aborts suspend when /sys/power/wakeup_count is
enabled
On 03/01/2026 13:12, Ulf Hansson wrote:
> + Adrian
+ Ricky WU <ricky_wu@...ltek.com>
>
> On Thu, 1 Jan 2026 at 05:58, Tabby Kitten <nyanpasu256@...il.com> wrote:
>>
>> Hi,
>>
>> It's been a few weeks since you looked into the bug. I think the merge window is over now, have you had the time to look into resolving this issue?
>
> Yes, sorry for the delay.
>
> See below for an attached patch. Please try it out and report back.
>
> Kind regards
> Uffe
>
>>
>> Tabby
>>
>> On Tue, Dec 9, 2025 at 7:09 AM Ulf Hansson <ulf.hansson@...aro.org> wrote:
>>>
>>> Hi,
>>>
>>> On Wed, 26 Nov 2025 at 10:08, Tabby Kitten <nyanpasu256@...il.com> wrote:
>>>>
>>>> On a PC with a Realtek PCI Express SD reader, when you sleep with
>>>> `wakeup_count` active (eg. sleeping from KDE's lock screen), the MMC
>>>> driver wakes up the system and aborts suspend.
>>>
>>> Okay, that's clearly a problem that needs to be fixed!
>>>
>>>>
>>>> I've found a sleep failure bug in the rtsx_pci and mmc_core drivers.
>>>> After userspace writes a number to `/sys/power/wakeup_count` (eg. KDE
>>>> Plasma does it to distinguish user wakes from timers and Wake-on-LAN),
>>>> if it attempts a mem suspend it will be aborted when
>>>> rtsx_pci_runtime_resume() -> mmc_detect_change() emits a
>>>> pm_wakeup_ws_event(). This breaks sleep on some hardware and desktop
>>>> environments.
>>>>
>>>> The detailed description:
>>>> The recently released Plasma 6.5.0 writes to `/sys/power/wakeup_count`
>>>> before sleeping. On my computer this caused the sleep attempt to fail
>>>> with dmesg error "PM: Some devices failed to suspend, or early wake
>>>> event detected". I got this error on both Arch Linux and Fedora, and
>>>> replicated it on Fedora with the mainline kernel COPR. KDE is tracking
>>>> this error at https://bugs.kde.org/show_bug.cgi?id=510992, and have
>>>> disabled writing to wakeup_count on Plasma 6.5.3 to work around this
>>>> issue.
>>>>
>>>> I've written a standalone shell script to reproduce this sleep failure
>>>> (save as badsleep.sh):
>>>>
>>>> #!/bin/bash
>>>> read wakeup_count < /sys/power/wakeup_count
>>>> if [[ $? -ne 0 ]]; then
>>>> e=$?
>>>> echo "Failed to open wakeup_count, suspend maybe already in progress"
>>>> exit $e
>>>> fi
>>>> echo $wakeup_count > /sys/power/wakeup_count
>>>> if [[ $? -ne 0 ]]; then
>>>> e=$?
>>>> echo "Failed to write wakeup_count, wakeup_count may have changed in between"
>>>> exit $e
>>>> fi
>>>> echo mem > /sys/power/state
>>>>
>>>> Running `sudo ./badsleep.sh` reproduces failed sleeps on my computer.
>>>> (sudo is needed to write to `/sys/power/wakeup_count` on Fedora.)
>>>>
>>>> * If I run the script unaltered, the screen turns off and on, and the
>>>> terminal outputs
>>>> `./badsleep.sh: line 14: echo: write error: Device or resource busy`
>>>> indicating the mem sleep failed.
>>>>
>>>> * If I edit the script and comment out `echo $wakeup_count >
>>>> /sys/power/wakeup_count`, the sleep succeeds, and waking the computer
>>>> skips the lock screen and resumes where I left off.
>>>>
>>>> * If I run `sudo rmmod rtsx_pci_sdmmc` to disable the faulty module, the
>>>> sleep succeeds, and waking the computer skips the lock screen and
>>>> resumes where I left off.
>>>>
>>>> I think this problem happens in general when a driver spawns a wakeup
>>>> event from its suspend callback. On my system, the driver in question
>>>> lies in the MMC subsystem.
>>>>
>>>> ## Code debugging
>>>>
>>>> If I run `echo 1 > /sys/power/pm_debug_messages` to enable verbose
>>>> logging, then attempt a failed sleep, I see output:
>>>>
>>>> PM: Wakeup pending, aborting suspend
>>>> PM: active wakeup source: mmc0
>>>> PM: suspend of devices aborted after 151.615 msecs
>>>> PM: start suspend of devices aborted after 169.797 msecs
>>>> PM: Some devices failed to suspend, or early wake event detected
>>>>
>>>> The "Wakeup pending, aborting suspend" message comes from function
>>>> `pm_wakeup_pending()`. This function checks if event checks are enabled,
>>>> and if some counters have changed aborts suspend and calls
>>>> `pm_print_active_wakeup_sources()`, which prints `wakeup_sources`.
>>>> Tracing the code that modifies `wakeup_sources`, I found that
>>>> `pm_wakeup_ws_event()` would activate an event and
>>>> `wakeup_source_register() → wakeup_source_add()` would add a new one.
>>>
>>> Thanks for all the details!
>>>
>>>>
>>>> To find who changed wakeup events, I used my stacksnoop fork at
>>>> https://github.com/nyanpasu64/bcc/blob/local/examples/tracing/stacksnoop
>>>> .py to trace a failed suspend:
>>>>
>>>> nyanpasu64@...en ~/code/bcc (local)> sudo ./examples/tracing/stacksnoop.py pm_wakeup_ws_event wakeup_source_register
>>>> TIME(s) FUNCTION
>>>> 7.254676819:
>>>> 0: ret_from_fork_asm [kernel]
>>>> 1: ret_from_fork [kernel]
>>>> 2: kthread [kernel]
>>>> 3: worker_thread [kernel]
>>>> 4: process_one_work [kernel]
>>>> 5: async_run_entry_fn [kernel]
>>>> 6: async_suspend [kernel]
>>>> 7: device_suspend [kernel]
>>>> 8: dpm_run_callback [kernel]
>>>> 9: mmc_bus_suspend [mmc_core]
>>>> 10: mmc_blk_suspend [mmc_block]
>>>> 11: mmc_queue_suspend [mmc_block]
>>>> 12: __mmc_claim_host [mmc_core]
>>>> 13: __pm_runtime_resume [kernel]
>>>> 14: rpm_resume [kernel]
>>>> 15: rpm_resume [kernel]
>>>> 16: rpm_callback [kernel]
>>>> 17: __rpm_callback [kernel]
>>>> 18: rtsx_pci_runtime_resume [rtsx_pci]
>>>> 19: mmc_detect_change [mmc_core]
>>>> 20: pm_wakeup_ws_event [kernel]
>>>>
>>>> On a previous kernel, lines 9-12 were replaced by a single call to
>>>> `pci_pm_suspend`. I've posted my detailed debugging on the older kernel
>>>> at https://bugs.kde.org/show_bug.cgi?id=510992#c26. There I found that
>>>> `pci_pm_suspend()` wakes PCI(e) devices before sending them into a full
>>>> sleep state, but in the process, `_mmc_detect_change()` will "Prevent
>>>> system sleep for 5s to allow user space to consume the\n corresponding
>>>> uevent"... which interrupts a system sleep in progress.
>>>>
>>>> On my current kernel, the same logic applies, but reading the source I
>>>> can't tell where `__mmc_claim_host()` is actually calling
>>>> `__pm_runtime_resume()`. Nonetheless the problem remains that
>>>> `rpm_resume()` is called during system suspend, `mmc_detect_change()`
>>>> wakes the system when called, and this will abort system sleep when
>>>> `/sys/power/wakeup_count` is active.
>>>
>>> __mmc_claim_host() will call pm_runtime_get_sync() to runtime resume
>>> the mmc host device.
>>>
>>> The mmc host device's parent (a pci device) will then be runtime
>>> resumed too. That's the call to rtsx_pci_runtime_resume() we see
>>> above.
>>>
>>> The problem is then that rtsx_pci_runtime_resume() invokes a callback
>>> (->card_event())) back into the mmc host driver
>>> (drivers/mmc/host/rtsx_pci_sdmmc.c), which ends up calling
>>> mmc_detect_change() to try to detect whether a card have been
>>> inserted/removed.
>>>
>>>>
>>>> ## Next steps
>>>>
>>>> How would this problem be addressed? Off the top of my head, perhaps you
>>>> could not call `__pm_runtime_resume()` on a SD card reader during the
>>>> `device_suspend()` process, not call `pm_wakeup_ws_event()` when the SD
>>>> card status changes, not call `pm_wakeup_ws_event()` *specifically*
>>>> when system suspend is temporarily waking up a SD card reader, or
>>>> disable pm_wakeup_ws_event() entirely during the suspend process (does
>>>> this defeat the purpose of the function?).
>>>
>>> Let me think a bit on what makes the best sense here. I will get back
>>> to you in a couple of days.
>>>
>>>>
>>>> Are there other drivers which cause the same symptoms? I don't know. I
>>>> asked on the KDE bug tracker for other users to attempt a failed sleep
>>>> with `echo 1 > /sys/power/pm_debug_messages` active, to identify which
>>>> driver broke suspend in their system; so far nobody has replied with
>>>> logs.
>>>>
>>>> Given that this bug is related to `/sys/power/wakeup_count`
>>>> (https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-power), I
>>>> was considering CCing Rafael J. Wysocki <rafael@...nel.org> and
>>>> linux-pm@...r.kernel.org, but have decided to only message the MMC
>>>> maintainers for now. If necessary we may have to forward this message
>>>> there to get their attention.
>>>>
>>>> ----
>>>>
>>>> System information:
>>>>
>>>> * I have an Intel NUC8i7BEH mini PC, with CPU 8 × Intel® Core™ i7-8559U
>>>> CPU @ 2.70GHz.
>>>>
>>>> * uname -mi prints `x86_64 unknown`.
>>>>
>>>> * `lspci -nn` prints
>>>> "6e:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader [10ec:522a] (rev 01)".
>>>>
>>>> * I am running kernel 6.18.0-0.rc7.357.vanilla.fc43.x86_64 from the Fedora COPRs
>>>> (https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories).
>>>>
>>>> * dmesg at https://gist.github.com/nyanpasu64/ab5d3d1565aafe6c1c08cbcaf074e44a#file-dmesg-2025-11-25-txt
>>>>
>>>> * Fully resolved config at https://gist.github.com/nyanpasu64/ab5d3d1565aafe6c1c08cbcaf074e44a#file-config-6-18-0-0-rc7-357-vanilla-fc43-x86_64,
>>>> source at https://download.copr.fedorainfracloud.org/results/@kernel-vanilla/mainline-wo-mergew/fedora-43-x86_64/09831015-mainline-womergew-releases/kernel-6.18.0-0.rc7.357.vanilla.fc43.src.rpm
>>>
>>> Kind regards
>>> Uffe
>
> From: Ulf Hansson <ulf.hansson@...aro.org>
> Date: Sat, 3 Jan 2026 11:55:44 +0100
> Subject: [PATCH] mmc: core: Avoid runtime PM of host in mmc_queue_suspend()
Seems reasonable, but isn't there also:
bus_ops->suspend() == mmc_sd_suspend()
_mmc_sd_suspend()
mmc_claim_host(host)
In general, it looks difficult to avoid runtime resume on
the suspend path. PCI will usually runtime resume anyway
i.e. from pci_pm_suspend():
/*
* PCI devices suspended at run time may need to be resumed at this
* point, because in general it may be necessary to reconfigure them for
* system suspend. Namely, if the device is expected to wake up the
* system from the sleep state, it may have to be reconfigured for this
* purpose, or if the device is not expected to wake up the system from
* the sleep state, it should be prevented from signaling wakeup events
* going forward.
*
* Also if the driver of the device does not indicate that its system
* suspend callbacks can cope with runtime-suspended devices, it is
* better to resume the device from runtime suspend here.
*/
if (!dev_pm_smart_suspend(dev) || pci_dev_need_resume(pci_dev)) {
pm_runtime_resume(dev);
So maybe alter rtsx_pci_runtime_resume() to avoid calling
pcr->slots[RTSX_SD_CARD].card_event() == rtsx_pci_sdmmc_card_event()
when suspending. Perhaps along the lines of the hack below:
static int rtsx_pci_runtime_resume(struct device *device)
{
struct pci_dev *pcidev = to_pci_dev(device);
struct pcr_handle *handle = pci_get_drvdata(pcidev);
struct rtsx_pcr *pcr = handle->pcr;
dev_dbg(device, "--> %s\n", __func__);
mutex_lock(&pcr->pcr_mutex);
rtsx_pci_write_register(pcr, HOST_SLEEP_STATE, 0x03, 0x00);
rtsx_pci_init_hw(pcr);
if (pcr->slots[RTSX_SD_CARD].p_dev != NULL) {
+#if IS_ENABLED(CONFIG_SUSPEND)
+ if (pm_suspend_target_state == PM_SUSPEND_ON)
+#endif
pcr->slots[RTSX_SD_CARD].card_event(
pcr->slots[RTSX_SD_CARD].p_dev);
}
mutex_unlock(&pcr->pcr_mutex);
return 0;
}
>
> WIP
>
> Signed-off-by: Ulf Hansson <ulf.hansson@...aro.org>
> ---
> drivers/mmc/core/core.c | 18 ++++++++++++------
> drivers/mmc/core/core.h | 11 ++++++++---
> drivers/mmc/core/queue.c | 4 ++--
> drivers/mmc/core/sdio_irq.c | 2 +-
> 4 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 860378bea557..c3923522833a 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -781,6 +781,7 @@ static inline void mmc_ctx_set_claimer(struct
> mmc_host *host,
> * @ctx: context that claims the host or NULL in which case the default
> * context will be used
> * @abort: whether or not the operation should be aborted
> + * @do_pm: whether to use runtime PM or not
> *
> * Claim a host for a set of operations. If @abort is non null and
> * dereference a non-zero value then this will return prematurely with
> @@ -788,7 +789,7 @@ static inline void mmc_ctx_set_claimer(struct
> mmc_host *host,
> * with the lock held otherwise.
> */
> int __mmc_claim_host(struct mmc_host *host, struct mmc_ctx *ctx,
> - atomic_t *abort)
> + atomic_t *abort, bool do_pm)
> {
> struct task_struct *task = ctx ? NULL : current;
> DECLARE_WAITQUEUE(wait, current);
> @@ -821,7 +822,7 @@ int __mmc_claim_host(struct mmc_host *host, struct
> mmc_ctx *ctx,
> spin_unlock_irqrestore(&host->lock, flags);
> remove_wait_queue(&host->wq, &wait);
>
> - if (pm)
> + if (do_pm && pm)
> pm_runtime_get_sync(mmc_dev(host));
>
> return stop;
> @@ -829,13 +830,14 @@ int __mmc_claim_host(struct mmc_host *host,
> struct mmc_ctx *ctx,
> EXPORT_SYMBOL(__mmc_claim_host);
>
> /**
> - * mmc_release_host - release a host
> + * __mmc_release_host - release a host
> * @host: mmc host to release
> + * @do_pm: whether to use runtime PM or not
> *
> * Release a MMC host, allowing others to claim the host
> * for their operations.
> */
> -void mmc_release_host(struct mmc_host *host)
> +void __mmc_release_host(struct mmc_host *host, bool do_pm)
> {
> unsigned long flags;
>
> @@ -851,6 +853,10 @@ void mmc_release_host(struct mmc_host *host)
> host->claimer = NULL;
> spin_unlock_irqrestore(&host->lock, flags);
> wake_up(&host->wq);
> +
> + if (!do_pm)
> + return;
> +
> pm_runtime_mark_last_busy(mmc_dev(host));
> if (host->caps & MMC_CAP_SYNC_RUNTIME_PM)
> pm_runtime_put_sync_suspend(mmc_dev(host));
> @@ -858,7 +864,7 @@ void mmc_release_host(struct mmc_host *host)
> pm_runtime_put_autosuspend(mmc_dev(host));
> }
> }
> -EXPORT_SYMBOL(mmc_release_host);
> +EXPORT_SYMBOL(__mmc_release_host);
>
> /*
> * This is a helper function, which fetches a runtime pm reference for the
> @@ -867,7 +873,7 @@ EXPORT_SYMBOL(mmc_release_host);
> void mmc_get_card(struct mmc_card *card, struct mmc_ctx *ctx)
> {
> pm_runtime_get_sync(&card->dev);
> - __mmc_claim_host(card->host, ctx, NULL);
> + __mmc_claim_host(card->host, ctx, NULL, true);
> }
> EXPORT_SYMBOL(mmc_get_card);
>
> diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
> index a028b48be164..5979c90d3b09 100644
> --- a/drivers/mmc/core/core.h
> +++ b/drivers/mmc/core/core.h
> @@ -135,8 +135,8 @@ unsigned int mmc_calc_max_discard(struct mmc_card *card);
> int mmc_set_blocklen(struct mmc_card *card, unsigned int blocklen);
>
> int __mmc_claim_host(struct mmc_host *host, struct mmc_ctx *ctx,
> - atomic_t *abort);
> -void mmc_release_host(struct mmc_host *host);
> + atomic_t *abort, bool do_pm);
> +void __mmc_release_host(struct mmc_host *host, bool do_pm);
> void mmc_get_card(struct mmc_card *card, struct mmc_ctx *ctx);
> void mmc_put_card(struct mmc_card *card, struct mmc_ctx *ctx);
>
> @@ -150,7 +150,12 @@ int mmc_card_alternative_gpt_sector(struct
> mmc_card *card, sector_t *sector);
> */
> static inline void mmc_claim_host(struct mmc_host *host)
> {
> - __mmc_claim_host(host, NULL, NULL);
> + __mmc_claim_host(host, NULL, NULL, true);
> +}
> +
> +static inline void mmc_release_host(struct mmc_host *host)
> +{
> + __mmc_release_host(host, true);
> }
>
> int mmc_cqe_start_req(struct mmc_host *host, struct mmc_request *mrq);
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 284856c8f655..76e83f49ff4e 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -477,8 +477,8 @@ void mmc_queue_suspend(struct mmc_queue *mq)
> * The host remains claimed while there are outstanding requests, so
> * simply claiming and releasing here ensures there are none.
> */
> - mmc_claim_host(mq->card->host);
> - mmc_release_host(mq->card->host);
> + __mmc_claim_host(mq->card->host, NULL, NULL, false);
> + __mmc_release_host(mq->card->host, false);
> }
>
> void mmc_queue_resume(struct mmc_queue *mq)
> diff --git a/drivers/mmc/core/sdio_irq.c b/drivers/mmc/core/sdio_irq.c
> index 2b24bdf38296..e5d4f8c634c8 100644
> --- a/drivers/mmc/core/sdio_irq.c
> +++ b/drivers/mmc/core/sdio_irq.c
> @@ -172,7 +172,7 @@ static int sdio_irq_thread(void *_host)
> * that doesn't require that lock to be held.
> */
> ret = __mmc_claim_host(host, NULL,
> - &host->sdio_irq_thread_abort);
> + &host->sdio_irq_thread_abort, true);
> if (ret)
> break;
> ret = process_sdio_pending_irqs(host);
Powered by blists - more mailing lists