[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jiLAgHCQ51cYqUX-xjir7ooAC3xKH9wMbwrebOEuxFdw@mail.gmail.com>
Date: Wed, 26 Nov 2025 20:16:38 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Bart Van Assche <bvanassche@....org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Yang Yang <yang.yang@...o.com>, Jens Axboe <axboe@...nel.dk>,
Pavel Machek <pavel@...nel.org>, Len Brown <lenb@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Danilo Krummrich <dakr@...nel.org>,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org
Subject: Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume
and runtime disable
On Wed, Nov 26, 2025 at 7:06 PM Bart Van Assche <bvanassche@....org> wrote:
>
> On 11/26/25 3:30 AM, Rafael J. Wysocki wrote:
> > On Wed, Nov 26, 2025 at 11:17 AM Yang Yang <yang.yang@...o.com> wrote:
> >> T1: T2:
> >> blk_queue_enter
> >> blk_pm_resume_queue
> >> pm_request_resume
> >
> > Shouldn't this be pm_runtime_resume() rather?
>
> I tried to make that change on an Android device. As a result, the
> kernel complaint shown below appeared. My understanding is that sleeping
> in atomic context can trigger a deadlock and hence is not allowed.
>
> [ 13.728890][ T1] WARNING: CPU: 6 PID: 1 at
> kernel/sched/core.c:9714 __might_sleep+0x78/0x84
> [ 13.758800][ T1] Call trace:
> [ 13.759027][ T1] __might_sleep+0x78/0x84
> [ 13.759340][ T1] __pm_runtime_resume+0x40/0xb8
> [ 13.759781][ T1] __bio_queue_enter+0xc0/0x1cc
> [ 13.760153][ T1] blk_mq_submit_bio+0x884/0xadc
> [ 13.760548][ T1] __submit_bio+0x2c8/0x49c
> [ 13.760879][ T1] __submit_bio_noacct_mq+0x38/0x88
> [ 13.761242][ T1] submit_bio_noacct_nocheck+0x4fc/0x7b8
> [ 13.761631][ T1] submit_bio+0x214/0x4c0
> [ 13.761941][ T1] mpage_readahead+0x1b8/0x1fc
> [ 13.762284][ T1] blkdev_readahead+0x18/0x28
> [ 13.762660][ T1] page_cache_ra_unbounded+0x310/0x4d8
> [ 13.763072][ T1] page_cache_ra_order+0xc0/0x5b0
> [ 13.763434][ T1] page_cache_sync_ra+0x17c/0x268
> [ 13.763782][ T1] filemap_read+0x4c4/0x12f4
> [ 13.764125][ T1] blkdev_read_iter+0x100/0x164
> [ 13.764475][ T1] vfs_read+0x188/0x348
> [ 13.764789][ T1] __se_sys_pread64+0x84/0xc8
> [ 13.765180][ T1] __arm64_sys_pread64+0x1c/0x2c
> [ 13.765556][ T1] invoke_syscall+0x58/0xf0
> [ 13.765876][ T1] do_el0_svc+0x8c/0xe0
> [ 13.766172][ T1] el0_svc+0x50/0xd4
> [ 13.766583][ T1] el0t_64_sync_handler+0x20/0xf4
> [ 13.766932][ T1] el0t_64_sync+0x1bc/0x1c0
> [ 13.767294][ T1] irq event stamp: 2589614
> [ 13.767592][ T1] hardirqs last enabled at (2589613):
> [<ffffffc0800eaf24>] finish_lock_switch+0x70/0x108
> [ 13.768283][ T1] hardirqs last disabled at (2589614):
> [<ffffffc0814b66f4>] el1_dbg+0x24/0x80
> [ 13.768875][ T1] softirqs last enabled at (2589370):
> [<ffffffc080082a7c>] ____do_softirq+0x10/0x20
> [ 13.769529][ T1] softirqs last disabled at (2589349):
> [<ffffffc080082a7c>] ____do_softirq+0x10/0x20
>
> I think that the filemap_invalidate_lock_shared() call in
> page_cache_ra_unbounded() forbids sleeping in submit_bio().
The wait_event() macro in __bio_queue_enter() calls might_sleep() at
the very beginning, so why would it not complain?
IIUC, this is the WARN_ONCE() in __might_sleep() about the task state
being different from TASK_RUNNING, which triggers because
prepare_to_wait_event() changes the task state to
TASK_UNINTERRUPTIBLE.
This means that calling pm_runtime_resume() cannot be part of the
wait_event() condition, so blk_pm_resume_queue() and the wait_event()
macros involving it would need some rewriting.
Powered by blists - more mailing lists