linux-kernel - Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume and runtime disable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0h-Dsi=Z2cRye39PVxgw3fyNdfsZynvzo2QaYrT-nNnow@mail.gmail.com>
Date: Thu, 27 Nov 2025 13:44:50 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: YangYang <yang.yang@...o.com>
Cc: Bart Van Assche <bvanassche@....org>, "Rafael J. Wysocki" <rafael@...nel.org>, Jens Axboe <axboe@...nel.dk>, 
	Pavel Machek <pavel@...nel.org>, Len Brown <lenb@...nel.org>, 
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Danilo Krummrich <dakr@...nel.org>, 
	linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-pm@...r.kernel.org
Subject: Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume
 and runtime disable

On Thu, Nov 27, 2025 at 12:29 PM YangYang <yang.yang@...o.com> wrote:
>
> On 2025/11/27 2:40, Bart Van Assche wrote:
> > On 11/26/25 7:41 AM, Rafael J. Wysocki wrote:
> >> As it stands, you have a basic problem with respect to system
> >> suspend/hibernation.  As I said before, the PM workqueue is frozen
> >> during system suspend/hibernation transitions, so waiting for an async
> >> resume request to complete then is pointless.
> >
> > Agreed. I noticed that any attempt to call request_firmware() from
> > driver system resume callback functions causes a deadlock if these
> > calls happen before the block device has been resumed.
> >
> > Thanks,
> >
> > Bart.
>
> Does this patch look reasonable to you? It hasn't been fully tested
> yet, but the resume is now performed synchronously.
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 66fb2071d..041d29ba4 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -323,12 +323,15 @@ int blk_queue_enter(struct request_queue *q,
> blk_mq_req_flags_t flags)
>                   * reordered.
>                   */
>                  smp_rmb();
> -               wait_event(q->mq_freeze_wq,
> -                          (!q->mq_freeze_depth &&
> -                           blk_pm_resume_queue(pm, q)) ||
> -                          blk_queue_dying(q));
> +check:
> +               wait_event(q->mq_freeze_wq, !q->mq_freeze_depth);

I think that you still need to check blk_queue_dying(q) under
wait_even() or you may not stop waiting when this happens.

> +
>                  if (blk_queue_dying(q))
>                          return -ENODEV;
> +               if (!blk_pm_resume_queue(pm, q)) {
> +                       pm_runtime_resume(q->dev);
> +                       goto check;
> +               }
>          }
>
>          rwsem_acquire_read(&q->q_lockdep_map, 0, 0, _RET_IP_);
> @@ -356,12 +359,15 @@ int __bio_queue_enter(struct request_queue *q,
> struct bio *bio)
>                   * reordered.
>                   */
>                  smp_rmb();
> -               wait_event(q->mq_freeze_wq,
> -                          (!q->mq_freeze_depth &&
> -                           blk_pm_resume_queue(false, q)) ||
> -                          test_bit(GD_DEAD, &disk->state));
> +check:
> +               wait_event(q->mq_freeze_wq, !q->mq_freeze_depth);

Analogously here, you may not stop waiting when test_bit(GD_DEAD,
&disk->state) is true.

> +
>                  if (test_bit(GD_DEAD, &disk->state))
>                          goto dead;
> +               if (!blk_pm_resume_queue(false, q)) {
> +                       pm_runtime_resume(q->dev);
> +                       goto check;
> +               }
>          }
>
>          rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
> diff --git a/block/blk-pm.h b/block/blk-pm.h
> index 8a5a0d4b3..c28fad105 100644
> --- a/block/blk-pm.h
> +++ b/block/blk-pm.h
> @@ -12,7 +12,6 @@ static inline int blk_pm_resume_queue(const bool pm,
> struct request_queue *q)
>                  return 1;       /* Nothing to do */
>          if (pm && q->rpm_status != RPM_SUSPENDED)
>                  return 1;       /* Request allowed */
> -       pm_request_resume(q->dev);
>          return 0;
>   }

And I would rename blk_pm_resume_queue() to something like
blk_pm_queue_active() because it is a bit confusing as it stands.

Apart from the above remarks this makes sense to me FWIW.