lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAPDyKFo_GQrUTrKgScBHo=MuRVP7dra9_NoDxHq8sVb7=sqCCA@mail.gmail.com>
Date: Tue, 2 Dec 2025 15:58:50 +0100
From: Ulf Hansson <ulf.hansson@...aro.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: YangYang <yang.yang@...o.com>, Bart Van Assche <bvanassche@....org>, Jens Axboe <axboe@...nel.dk>, 
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Danilo Krummrich <dakr@...nel.org>, 
	linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-pm@...r.kernel.org
Subject: Re: [PATCH v1] PM: sleep: Do not flag runtime PM workqueue as freezable

On Mon, 1 Dec 2025 at 20:58, Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Monday, December 1, 2025 7:47:46 PM CET Rafael J. Wysocki wrote:
> > On Mon, Dec 1, 2025 at 10:46 AM YangYang <yang.yang@...o.com> wrote:
>
> [cut]
>
> > If blk_queue_enter() or __bio_queue_enter() is allowed to race with
> > disabling runtime PM for q->dev, failure to resume q->dev is alway
> > possible and there are no changes that can be made to
> > pm_runtime_disable() to prevent that from happening.  If
> > __pm_runtime_disable() wins the race, it will increment
> > power.disable_depth and rpm_resume() will bail out when it sees that
> > no matter what.
> >
> > You should not conflate "runtime PM doesn't work when it is disabled"
> > with "asynchronous runtime PM doesn't work after freezing the PM
> > workqueue".  They are both true, but they are not the same.
>
> So I've been testing the patch below for a few days and it will eliminate
> the latter, but even after this patch runtime PM will be disabled in
> device_suspend_late() and if the problem you are facing is still there
> after this patch, it will need to dealt with at the driver level.
>
> Generally speaking, driver involvement is needed to make runtime PM and
> system suspend/resume work together in the majority of cases.
>
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> Subject:
>
> Till now, the runtime PM workqueue has been flagged as freezable, so it
> does not process work items during system-wide PM transitions like
> system suspend and resume.  The original reason to do that was to
> reduce the likelihood of runtime PM getting in the way of system-wide
> PM processing, but now it is mostly an optimization because (1) runtime
> suspend of devices is prevented by bumping up their runtime PM usage
> counters in device_prepare() and (2) device drivers are expected to
> disable runtime PM for the devices handled by them before they embark
> on system-wide PM activities that may change the state of the hardware
> or otherwise interfere with runtime PM.  However, it prevents
> asynchronous runtime resume of devices from working during system-wide
> PM transitions, which is confusing because synchronous runtime resume
> is not prevented at the same time, and it also sometimes turns out to
> be problematic.
>
> For example, it has been reported that blk_queue_enter() may deadlock
> during a system suspend transition because of the pm_request_resume()
> usage in it [1].  That happens because the asynchronous runtime resume
> of the given device is not processed due to the freezing of the runtime
> PM workqueue.  While it may be better to address this particular issue
> in the block layer, the very presence of it means that similar problems
> may be expected to occur elsewhere.
>
> For this reason, remove the WQ_FREEZABLE flag from the runtime PM
> workqueue and make device_suspend_late() use the generic variant of
> pm_runtime_disable() that will carry out runtime PM of the device
> synchronously if there is pending resume work for it.
>
> Also update the comment before the pm_runtime_disable() call in
> device_suspend_late() to document the fact that the runtime PM
> should not be expected to work for the device until the end of
> device_resume_early().
>
> This change may, even though it is not expected to, uncover some
> latent issues related to queuing up asynchronous runtime resume
> work items during system suspend or hibernation.  However, they
> should be limited to the interference between runtime resume and
> system-wide PM callbacks in the cases when device drivers start
> to handle system-wide PM before disabling runtime PM as described
> above.
>
> Link: https://lore.kernel.org/linux-pm/20251126101636.205505-2-yang.yang@vivo.com/
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>

I agree with the above and this seems like a reasonable change to me.
Yep, it's not entirely easy to know whether all users of
pm_request_resume() (and similar) are fine with this too, but in
general I think they should.

So, feel free to add:

Reviewed-by: Ulf Hansson <ulf.hansson@...aro.org>

Kind regards
Uffe

> ---
>  drivers/base/power/main.c |    7 ++++---
>  kernel/power/main.c       |    2 +-
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1647,10 +1647,11 @@ static void device_suspend_late(struct d
>                 goto Complete;
>
>         /*
> -        * Disable runtime PM for the device without checking if there is a
> -        * pending resume request for it.
> +        * After this point, any runtime PM operations targeting the device
> +        * will fail until the corresponding pm_runtime_enable() call in
> +        * device_resume_early().
>          */
> -       __pm_runtime_disable(dev, false);
> +       pm_runtime_disable(dev);
>
>         if (dev->power.syscore)
>                 goto Skip;
> --- a/kernel/power/main.c
> +++ b/kernel/power/main.c
> @@ -1125,7 +1125,7 @@ EXPORT_SYMBOL_GPL(pm_wq);
>
>  static int __init pm_start_workqueues(void)
>  {
> -       pm_wq = alloc_workqueue("pm", WQ_FREEZABLE | WQ_UNBOUND, 0);
> +       pm_wq = alloc_workqueue("pm", WQ_UNBOUND, 0);
>         if (!pm_wq)
>                 return -ENOMEM;
>
>
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ