linux-kernel - Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume and runtime disable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0i+BhxX54wyogVR4_fmTJHVFfozNrP5LN4pGDPnnL=EDQ@mail.gmail.com>
Date: Mon, 1 Dec 2025 19:55:27 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: YangYang <yang.yang@...o.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Bart Van Assche <bvanassche@....org>, Jens Axboe <axboe@...nel.dk>, 
	Pavel Machek <pavel@...nel.org>, Len Brown <lenb@...nel.org>, 
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Danilo Krummrich <dakr@...nel.org>, 
	linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-pm@...r.kernel.org
Subject: Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume
 and runtime disable

On Mon, Dec 1, 2025 at 1:56 PM YangYang <yang.yang@...o.com> wrote:
>
> On 2025/12/1 17:46, YangYang wrote:
> > On 2025/11/27 20:34, Rafael J. Wysocki wrote:
> >> On Wed, Nov 26, 2025 at 11:47 PM Bart Van Assche <bvanassche@....org> wrote:
> >>>
> >>> On 11/26/25 1:30 PM, Rafael J. Wysocki wrote:
> >>>> On Wed, Nov 26, 2025 at 10:11 PM Bart Van Assche <bvanassche@....org> wrote:
> >>>>>
> >>>>> On 11/26/25 12:17 PM, Rafael J. Wysocki wrote:
> >>>>>> --- a/block/blk-core.c
> >>>>>> +++ b/block/blk-core.c
> >>>>>> @@ -309,6 +309,8 @@ int blk_queue_enter(struct request_queue
> >>>>>>                 if (flags & BLK_MQ_REQ_NOWAIT)
> >>>>>>                         return -EAGAIN;
> >>>>>>
> >>>>>> +             /* if necessary, resume .dev (assume success). */
> >>>>>> +             blk_pm_resume_queue(pm, q);
> >>>>>>                 /*
> >>>>>>                  * read pair of barrier in blk_freeze_queue_start(), we need to
> >>>>>>                  * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and
> >>>>>
> >>>>> blk_queue_enter() may be called from the suspend path so I don't think
> >>>>> that the above change will work.
> >>>>
> >>>> Why would the existing code work then?
> >>>
> >>> The existing code works reliably on a very large number of devices.
> >>
> >> Well, except that it doesn't work during system suspend and
> >> hibernation when the PM workqueue is frozen.  I think that we agree
> >> here.
> >>
> >> This needs to be addressed because it may very well cause system
> >> suspend to deadlock.
> >>
> >> There are two possible ways to address it I can think of:
> >>
> >> 1. Changing blk_pm_resume_queue() and its users to carry out a
> >> synchronous resume of q->dev instead of calling pm_request_resume()
> >> and (effectively) waiting for the queued-up runtime resume of q->dev
> >> to take effect.
> >>
> >> This would be my preferred option, but at this point I'm not sure if
> >> it's viable.
> >>
> >
> > After __pm_runtime_disable() is called from device_suspend_late(), dev->power.disable_depth is set, preventing
> > rpm_resume() from making progress until the system resume completes, regardless of whether rpm_resume() is invoked
> > synchronously or asynchronously.
> > Performing a synchronous resume of q->dev seems to have a similar effect to removing the following code block from
> > __pm_runtime_barrier(), which is invoked by __pm_runtime_disable():
> >
> > 1428     if (dev->power.request_pending) {
> > 1429         dev->power.request = RPM_REQ_NONE;
> > 1430         spin_unlock_irq(&dev->power.lock);
> > 1431
> > 1432         cancel_work_sync(&dev->power.work);
> > 1433
> > 1434         spin_lock_irq(&dev->power.lock);
> > 1435         dev->power.request_pending = false;
> > 1436     }
> >
>
> Since both synchronous and asynchronous resumes face similar issues,

No, they don't.

> it may be sufficient to keep using the asynchronous resume path as long as
> pending work items are not canceled while the PM workqueue is frozen.

Except for two things:

1. If blk_queue_enter() or __bio_queue_enter() is allowed to race with
disabling runtime PM, queuing up the resume work item may fail in the
first place.

2. If a device runtime resume work item is queued up before the whole
system is suspended, it may not make sense to run that work item after
resuming the whole system because the state of the system as a whole
is generally different at that point.

> This allows the pending work to proceed normally once the PM workqueue
> is unfrozen.

Not really.