[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0h-Dsi=Z2cRye39PVxgw3fyNdfsZynvzo2QaYrT-nNnow@mail.gmail.com>
Date: Thu, 27 Nov 2025 13:44:50 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: YangYang <yang.yang@...o.com>
Cc: Bart Van Assche <bvanassche@....org>, "Rafael J. Wysocki" <rafael@...nel.org>, Jens Axboe <axboe@...nel.dk>,
Pavel Machek <pavel@...nel.org>, Len Brown <lenb@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Danilo Krummrich <dakr@...nel.org>,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org
Subject: Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume
and runtime disable
On Thu, Nov 27, 2025 at 12:29 PM YangYang <yang.yang@...o.com> wrote:
>
> On 2025/11/27 2:40, Bart Van Assche wrote:
> > On 11/26/25 7:41 AM, Rafael J. Wysocki wrote:
> >> As it stands, you have a basic problem with respect to system
> >> suspend/hibernation. As I said before, the PM workqueue is frozen
> >> during system suspend/hibernation transitions, so waiting for an async
> >> resume request to complete then is pointless.
> >
> > Agreed. I noticed that any attempt to call request_firmware() from
> > driver system resume callback functions causes a deadlock if these
> > calls happen before the block device has been resumed.
> >
> > Thanks,
> >
> > Bart.
>
> Does this patch look reasonable to you? It hasn't been fully tested
> yet, but the resume is now performed synchronously.
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 66fb2071d..041d29ba4 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -323,12 +323,15 @@ int blk_queue_enter(struct request_queue *q,
> blk_mq_req_flags_t flags)
> * reordered.
> */
> smp_rmb();
> - wait_event(q->mq_freeze_wq,
> - (!q->mq_freeze_depth &&
> - blk_pm_resume_queue(pm, q)) ||
> - blk_queue_dying(q));
> +check:
> + wait_event(q->mq_freeze_wq, !q->mq_freeze_depth);
I think that you still need to check blk_queue_dying(q) under
wait_even() or you may not stop waiting when this happens.
> +
> if (blk_queue_dying(q))
> return -ENODEV;
> + if (!blk_pm_resume_queue(pm, q)) {
> + pm_runtime_resume(q->dev);
> + goto check;
> + }
> }
>
> rwsem_acquire_read(&q->q_lockdep_map, 0, 0, _RET_IP_);
> @@ -356,12 +359,15 @@ int __bio_queue_enter(struct request_queue *q,
> struct bio *bio)
> * reordered.
> */
> smp_rmb();
> - wait_event(q->mq_freeze_wq,
> - (!q->mq_freeze_depth &&
> - blk_pm_resume_queue(false, q)) ||
> - test_bit(GD_DEAD, &disk->state));
> +check:
> + wait_event(q->mq_freeze_wq, !q->mq_freeze_depth);
Analogously here, you may not stop waiting when test_bit(GD_DEAD,
&disk->state) is true.
> +
> if (test_bit(GD_DEAD, &disk->state))
> goto dead;
> + if (!blk_pm_resume_queue(false, q)) {
> + pm_runtime_resume(q->dev);
> + goto check;
> + }
> }
>
> rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> diff --git a/block/blk-pm.h b/block/blk-pm.h
> index 8a5a0d4b3..c28fad105 100644
> --- a/block/blk-pm.h
> +++ b/block/blk-pm.h
> @@ -12,7 +12,6 @@ static inline int blk_pm_resume_queue(const bool pm,
> struct request_queue *q)
> return 1; /* Nothing to do */
> if (pm && q->rpm_status != RPM_SUSPENDED)
> return 1; /* Request allowed */
> - pm_request_resume(q->dev);
> return 0;
> }
And I would rename blk_pm_resume_queue() to something like
blk_pm_queue_active() because it is a bit confusing as it stands.
Apart from the above remarks this makes sense to me FWIW.
Powered by blists - more mailing lists