[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9f666390-653d-4834-800d-8997665b6dac@intel.com>
Date: Wed, 15 Oct 2025 10:09:01 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: Michael Wu <michael@...winnertech.com>, <ulf.hansson@...aro.org>,
<linus.walleij@...aro.org>, <brgl@...ev.pl>, <avri.altman@....com>,
<wsa+renesas@...g-engineering.com>, <victor.shih@...esyslogic.com.tw>,
<andy-ld.lu@...iatek.com>
CC: <jason.lai@...esyslogic.com.tw>, <linux-mmc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-gpio@...r.kernel.org>
Subject: Re: [RESEND] mmc: core: Fix system shutdown hang in mmc_bus_shutdown
On 15/10/2025 09:07, Michael Wu wrote:
> During system shutdown, mmc_bus_shutdown() calls __mmc_stop_host() which
> uses cancel_delayed_work_sync(). This can block indefinitely if the work
> queue is stuck, causing the system to hang during shutdown.
>
> This patch introduces a new function __mmc_stop_host_no_sync() that skips
> the synchronous work cancellation, preventing potential shutdown hangs.
mmc core must ensure there are no ongoing operations racing with shutdown.
Leaving the work running does not look like it would achieve that.
Perhaps it can be cancelled earlier? There seems to be a "reboot" notifier
associated with shutdown, refer reboot_notifier_list and
register_reboot_notifier(). Note, in that case, it is also necessary to
ensure nothing can queue the work again.
> The function is used in mmc_bus_shutdown() where blocking is not
> acceptable during system shutdown.
>
> Changes:
> - Add __mmc_stop_host_no_sync() function that avoids cancel_delayed_work_sync()
> - Update mmc_bus_shutdown() to use the new non-blocking function
> - Keep the original __mmc_stop_host() unchanged for normal operation
>
> This ensures graceful system shutdown while maintaining existing
> functionality for regular MMC host operations.
>
> stack information when an error occurs:
> INFO: task init:1 blocked for more than 720 seconds.
> Tainted: G OE 5.15.185-android13-8-00043-gd00fb6bce7ed-ab13792018 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:init state:D stack: 0 pid: 1 ppid: 0 flags:0x04000008
> Call trace:
> __switch_to+0x234/0x470
> __schedule+0x694/0xb8c
> schedule+0x150/0x254
> schedule_timeout+0x48/0x138
> wait_for_common+0x144/0x308
> __flush_work+0x3d8/0x508
> __cancel_work_timer+0x120/0x2e8
> mmc_bus_shutdown+0x90/0x158
> device_shutdown+0x204/0x434
> kernel_restart+0x54/0x220
> kernel_restart+0x0/0x220
> invoke_syscall+0x60/0x150
> el0_svc_common+0xb8/0xf8
> do_el0_svc+0x28/0x98
> el0_svc+0x24/0x84
> el0t_64_sync_handler+0x88/0xec
> el0t_64_sync+0x1b8/0x1bc
> INFO: task kworker/1:1:73 blocked for more than 721 seconds.
> Tainted: G OE 5.15.185-android13-8-00043-gd00fb6bce7ed-ab13792018 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/1:1 state:D stack: 0 pid: 73 ppid: 2 flags:0x00000008
> Workqueue: events_freezable mmc_rescan.cfi_jt
> Call trace:
> __switch_to+0x234/0x470
> __schedule+0x694/0xb8c
> schedule+0x150/0x254
> schedule_preempt_disabled+0x2c/0x4c
> __mutex_lock+0x360/0xb00
> __mutex_lock_slowpath+0x18/0x28
> mutex_lock+0x48/0x12c
> device_del+0x48/0x8d0
> mmc_remove_card+0x128/0x158
> mmc_sdio_remove+0x190/0x1ac
> mmc_sdio_detect+0x7c/0x118
> mmc_rescan+0xe8/0x42c
> process_one_work+0x248/0x55c
> worker_thread+0x3b0/0x740
> kthread+0x168/0x1dc
> ret_from_fork+0x10/0x20
>
> Signed-off-by: Michael Wu <michael@...winnertech.com>
> ---
> drivers/mmc/core/bus.c | 2 +-
> drivers/mmc/core/core.c | 14 ++++++++++++++
> drivers/mmc/core/core.h | 1 +
> 3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
> index 1cf64e0952fbe..6ff6fcb4c6f27 100644
> --- a/drivers/mmc/core/bus.c
> +++ b/drivers/mmc/core/bus.c
> @@ -149,7 +149,7 @@ static void mmc_bus_shutdown(struct device *dev)
> if (dev->driver && drv->shutdown)
> drv->shutdown(card);
>
> - __mmc_stop_host(host);
> + __mmc_stop_host_no_sync(host);
>
> if (host->bus_ops->shutdown) {
> ret = host->bus_ops->shutdown(host);
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index a0e2dce704343..2d75ad26f84a9 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -2336,6 +2336,20 @@ void __mmc_stop_host(struct mmc_host *host)
> cancel_delayed_work_sync(&host->detect);
> }
>
> +void __mmc_stop_host_no_sync(struct mmc_host *host)
> +{
> + if (host->rescan_disable)
> + return;
> +
> + if (host->slot.cd_irq >= 0) {
> + mmc_gpio_set_cd_wake(host, false);
> + disable_irq(host->slot.cd_irq);
> + }
> +
> + host->rescan_disable = 1;
> + /* Skip cancel_delayed_work_sync to avoid potential blocking */
> +}
> +
> void mmc_stop_host(struct mmc_host *host)
> {
> __mmc_stop_host(host);
> diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
> index 622085cd766f9..eb59a61717357 100644
> --- a/drivers/mmc/core/core.h
> +++ b/drivers/mmc/core/core.h
> @@ -71,6 +71,7 @@ static inline void mmc_delay(unsigned int ms)
> void mmc_rescan(struct work_struct *work);
> void mmc_start_host(struct mmc_host *host);
> void __mmc_stop_host(struct mmc_host *host);
> +void __mmc_stop_host_no_sync(struct mmc_host *host);
> void mmc_stop_host(struct mmc_host *host);
>
> void _mmc_detect_change(struct mmc_host *host, unsigned long delay,
Powered by blists - more mailing lists