linux-kernel - [PATCH v2] PM: sleep: Do not flag runtime PM workqueue as freezable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <12794222.O9o76ZdvQC@rafael.j.wysocki>
Date: Fri, 05 Dec 2025 16:24:05 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: linux-pm@...r.kernel.org
Cc: YangYang <yang.yang@...o.com>, Bart Van Assche <bvanassche@....org>,
 Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, Ulf Hansson <ulf.hansson@...aro.org>
Subject: [PATCH v2] PM: sleep: Do not flag runtime PM workqueue as freezable

From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>

Till now, the runtime PM workqueue has been flagged as freezable, so it
does not process work items during system-wide PM transitions like
system suspend and resume.  The original reason to do that was to
reduce the likelihood of runtime PM getting in the way of system-wide
PM processing, but now it is mostly an optimization because (1) runtime
suspend of devices is prevented by bumping up their runtime PM usage
counters in device_prepare() and (2) device drivers are expected to
disable runtime PM for the devices handled by them before they embark
on system-wide PM activities that may change the state of the hardware
or otherwise interfere with runtime PM.  However, it prevents
asynchronous runtime resume of devices from working during system-wide
PM transitions, which is confusing because synchronous runtime resume
is not prevented at the same time, and it also sometimes turns out to
be problematic.

For example, it has been reported that blk_queue_enter() may deadlock
during a system suspend transition because of the pm_request_resume()
usage in it [1].  That happens because the asynchronous runtime resume
of the given device is not processed due to the freezing of the runtime
PM workqueue.  While it may be better to address this particular issue
in the block layer, the very presence of it means that similar problems
may be expected to occur elsewhere.

For this reason, remove the WQ_FREEZABLE flag from the runtime PM
workqueue and make device_suspend_late() use the generic variant of
pm_runtime_disable() that will carry out runtime PM of the device
synchronously if there is pending resume work for it.

Also update the comment before the pm_runtime_disable() call in
device_suspend_late(), to document the fact that the runtime PM
should not be expected to work for the device until the end of
device_resume_early(), and update the related documentation.

This change may, even though it is not expected to, uncover some
latent issues related to queuing up asynchronous runtime resume
work items during system suspend or hibernation.  However, they
should be limited to the interference between runtime resume and
system-wide PM callbacks in the cases when device drivers start
to handle system-wide PM before disabling runtime PM as described
above.

Link: https://lore.kernel.org/linux-pm/20251126101636.205505-2-yang.yang@vivo.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
Reviewed-by: Ulf Hansson <ulf.hansson@...aro.org>
---

v1 -> v2:
   * Update documentation in runtime_pm.rst.
   * Add R-by from Ulf.

---
 Documentation/power/runtime_pm.rst |    7 +++----
 drivers/base/power/main.c          |    7 ++++---
 kernel/power/main.c                |    2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

--- a/Documentation/power/runtime_pm.rst
+++ b/Documentation/power/runtime_pm.rst
@@ -714,10 +714,9 @@ out the following operations:
   * During system suspend pm_runtime_get_noresume() is called for every device
     right before executing the subsystem-level .prepare() callback for it and
     pm_runtime_barrier() is called for every device right before executing the
-    subsystem-level .suspend() callback for it.  In addition to that the PM core
-    calls __pm_runtime_disable() with 'false' as the second argument for every
-    device right before executing the subsystem-level .suspend_late() callback
-    for it.
+    subsystem-level .suspend() callback for it.  In addition to that, the PM
+    core disables runtime PM for every device right before executing the
+    subsystem-level .suspend_late() callback for it.

   * During system resume pm_runtime_enable() and pm_runtime_put() are called for
     every device right after executing the subsystem-level .resume_early()
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1647,10 +1647,11 @@ static void device_suspend_late(struct d
 		goto Complete;

 	/*
-	 * Disable runtime PM for the device without checking if there is a
-	 * pending resume request for it.
+	 * After this point, any runtime PM operations targeting the device
+	 * will fail until the corresponding pm_runtime_enable() call in
+	 * device_resume_early().
 	 */
-	__pm_runtime_disable(dev, false);
+	pm_runtime_disable(dev);

 	if (dev->power.syscore)
 		goto Skip;
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -1125,7 +1125,7 @@ EXPORT_SYMBOL_GPL(pm_wq);

 static int __init pm_start_workqueues(void)
 {
-	pm_wq = alloc_workqueue("pm", WQ_FREEZABLE | WQ_UNBOUND, 0);
+	pm_wq = alloc_workqueue("pm", WQ_UNBOUND, 0);
 	if (!pm_wq)
 		return -ENOMEM;