[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPDyKFpdYhp5Go7_gSh=A0q3kxHs_gcBsUi6wc8sMs5bZW2JFA@mail.gmail.com>
Date: Fri, 26 Sep 2025 15:09:59 +0200
From: Ulf Hansson <ulf.hansson@...aro.org>
To: Michael Wu <michael@...winnertech.com>
Cc: linus.walleij@...aro.org, brgl@...ev.pl, adrian.hunter@...el.com,
avri.altman@....com, wsa+renesas@...g-engineering.com,
andy-ld.lu@...iatek.com, victor.shih@...esyslogic.com.tw,
linux-mmc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-gpio@...r.kernel.org
Subject: Re: [PATCH] mmc: core: Fix system shutdown hang in mmc_bus_shutdown
On Fri, 26 Sept 2025 at 04:49, Michael Wu <michael@...winnertech.com> wrote:
>
> During system shutdown, mmc_bus_shutdown() calls __mmc_stop_host() which
> uses cancel_delayed_work_sync(). This can block indefinitely if the work
> queue is stuck, causing the system to hang during shutdown.
Why, more exactly, is it stuck?
I looked at the trace below, it looks like we are failing to remove an
SDIO card, why?
>
> This patch introduces a new function __mmc_stop_host_no_sync() that skips
> the synchronous work cancellation, preventing potential shutdown hangs.
> The function is used in mmc_bus_shutdown() where blocking is not
> acceptable during system shutdown.
This isn't the only thing that can block in mmc_bus_shutdown().
With this change, I am worried that we may execute the
power-off-notifications to an eMMC/SD card, when it's not safe to do
so. But perhaps there is no other way?
Kind regards
Uffe
>
> Changes:
> - Add __mmc_stop_host_no_sync() function that avoids cancel_delayed_work_sync()
> - Update mmc_bus_shutdown() to use the new non-blocking function
> - Keep the original __mmc_stop_host() unchanged for normal operation
>
> This ensures graceful system shutdown while maintaining existing
> functionality for regular MMC host operations.
>
> stack information when an error occurs:
> INFO: task init:1 blocked for more than 720 seconds.
> Tainted: G OE 5.15.185-android13-8-00043-gd00fb6bce7ed-ab13792018 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:init state:D stack: 0 pid: 1 ppid: 0 flags:0x04000008
> Call trace:
> __switch_to+0x234/0x470
> __schedule+0x694/0xb8c
> schedule+0x150/0x254
> schedule_timeout+0x48/0x138
> wait_for_common+0x144/0x308
> __flush_work+0x3d8/0x508
> __cancel_work_timer+0x120/0x2e8
> mmc_bus_shutdown+0x90/0x158
> device_shutdown+0x204/0x434
> kernel_restart+0x54/0x220
> kernel_restart+0x0/0x220
> invoke_syscall+0x60/0x150
> el0_svc_common+0xb8/0xf8
> do_el0_svc+0x28/0x98
> el0_svc+0x24/0x84
> el0t_64_sync_handler+0x88/0xec
> el0t_64_sync+0x1b8/0x1bc
> INFO: task kworker/1:1:73 blocked for more than 721 seconds.
> Tainted: G OE 5.15.185-android13-8-00043-gd00fb6bce7ed-ab13792018 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/1:1 state:D stack: 0 pid: 73 ppid: 2 flags:0x00000008
> Workqueue: events_freezable mmc_rescan.cfi_jt
> Call trace:
> __switch_to+0x234/0x470
> __schedule+0x694/0xb8c
> schedule+0x150/0x254
> schedule_preempt_disabled+0x2c/0x4c
> __mutex_lock+0x360/0xb00
> __mutex_lock_slowpath+0x18/0x28
> mutex_lock+0x48/0x12c
> device_del+0x48/0x8d0
> mmc_remove_card+0x128/0x158
> mmc_sdio_remove+0x190/0x1ac
> mmc_sdio_detect+0x7c/0x118
> mmc_rescan+0xe8/0x42c
> process_one_work+0x248/0x55c
> worker_thread+0x3b0/0x740
> kthread+0x168/0x1dc
> ret_from_fork+0x10/0x20
>
> Signed-off-by: Michael Wu <michael@...winnertech.com>
> ---
> drivers/mmc/core/bus.c | 2 +-
> drivers/mmc/core/core.c | 14 ++++++++++++++
> drivers/mmc/core/core.h | 1 +
> 3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c
> index 1cf64e0952fbe..6ff6fcb4c6f27 100644
> --- a/drivers/mmc/core/bus.c
> +++ b/drivers/mmc/core/bus.c
> @@ -149,7 +149,7 @@ static void mmc_bus_shutdown(struct device *dev)
> if (dev->driver && drv->shutdown)
> drv->shutdown(card);
>
> - __mmc_stop_host(host);
> + __mmc_stop_host_no_sync(host);
>
> if (host->bus_ops->shutdown) {
> ret = host->bus_ops->shutdown(host);
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index a0e2dce704343..2d75ad26f84a9 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -2336,6 +2336,20 @@ void __mmc_stop_host(struct mmc_host *host)
> cancel_delayed_work_sync(&host->detect);
> }
>
> +void __mmc_stop_host_no_sync(struct mmc_host *host)
> +{
> + if (host->rescan_disable)
> + return;
> +
> + if (host->slot.cd_irq >= 0) {
> + mmc_gpio_set_cd_wake(host, false);
> + disable_irq(host->slot.cd_irq);
> + }
> +
> + host->rescan_disable = 1;
> + /* Skip cancel_delayed_work_sync to avoid potential blocking */
> +}
> +
> void mmc_stop_host(struct mmc_host *host)
> {
> __mmc_stop_host(host);
> diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
> index 622085cd766f9..eb59a61717357 100644
> --- a/drivers/mmc/core/core.h
> +++ b/drivers/mmc/core/core.h
> @@ -71,6 +71,7 @@ static inline void mmc_delay(unsigned int ms)
> void mmc_rescan(struct work_struct *work);
> void mmc_start_host(struct mmc_host *host);
> void __mmc_stop_host(struct mmc_host *host);
> +void __mmc_stop_host_no_sync(struct mmc_host *host);
> void mmc_stop_host(struct mmc_host *host);
>
> void _mmc_detect_change(struct mmc_host *host, unsigned long delay,
> --
> 2.29.0
>
Powered by blists - more mailing lists