lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0haFgyoXqapvpESUec0_Pxw-uckTGSpVOWDQPbxWU-=Dg@mail.gmail.com>
Date: Tue, 2 Dec 2025 14:37:19 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Bart Van Assche <bvanassche@....org>
Cc: YangYang <yang.yang@...o.com>, Jens Axboe <axboe@...nel.dk>, Pavel Machek <pavel@...nel.org>, 
	Len Brown <lenb@...nel.org>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, 
	Danilo Krummrich <dakr@...nel.org>, linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-pm@...r.kernel.org
Subject: Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume
 and runtime disable

On Tue, Dec 2, 2025 at 1:14 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Tue, Dec 2, 2025 at 1:41 AM Bart Van Assche <bvanassche@....org> wrote:
> >
> > On 12/1/25 10:47 AM, Rafael J. Wysocki wrote:
> > > Generally speaking, if blk_queue_enter() or __bio_queue_enter() may
> > > run in parallel with device_suspend_late() for q->dev, the driver of
> > > that device is defective, because it is responsible for preventing
> > > this situation from happening.  The most straightforward way to
> > > achieve that is to provide a .suspend() callback for q->dev that will
> > > runtime-resume it (and, of course, q->dev will need to be prepared for
> > > system suspend as appropriate after that).
> >
> > Isn't the suspend / hibernation order such that no block I/O is
> > submitted while block devices transition to a lower power state? I'm
> > surprised to read that individual drivers are responsible for preventing
> > that blk_queue_enter() or __bio_queue_enter() run concurrently with
> > device_suspend_late().
>
> To be more precise, they don't need to be prevented from running
> concurrently with device_suspend_late() in general.  The driver needs
> to ensure though that q->dev is not runtime-suspended in
> device_suspend_late() if blk_queue_enter() or __bio_queue_enter() are
> expected to run in parallel with it or later.
>
> > Regarding the UFSHCI driver: if a UFS controller is already runtime
> > suspended, we want it to remain suspended during system suspend.
>
> That can be done, but still the driver is responsible for preparing
> the device for system suspend.
>
> The most popular strategy is to use pm_runtime_force_suspend/resume()
> as driver suspend callbacks for the device, either as
> .suspend()/.resume() or as .suspend_late()/resume_early(),
> respectively.  In both cases, runtime PM will be disabled and runtime
> PM callbacks will be used for stopping the device - or not, if it is
> suspended already - but after that it must not be accessed in any way
> until the resume part runs.

One more thing that needs to be said here: The PM core expects the
decision on whether or not to leave a runtime-suspended device in
suspend across system-wide suspend-resume to be made before
device_suspend_late() is called for that device.  If the device is
suspended at that point, the expectation is that it will be left in
suspend.  Otherwise, the expectation is that it will be taken care of
by the .suspend_late() and .suspend_noirq() callbacks (and this goes
beyond runtime PM, quite obviously).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ