linux-kernel - Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume and runtime disable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8fa4023f-50f2-4e25-9f9b-4e5236015e27@vivo.com>
Date: Mon, 1 Dec 2025 20:56:06 +0800
From: YangYang <yang.yang@...o.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>,
 Bart Van Assche <bvanassche@....org>
Cc: Jens Axboe <axboe@...nel.dk>, Pavel Machek <pavel@...nel.org>,
 Len Brown <lenb@...nel.org>, Greg Kroah-Hartman
 <gregkh@...uxfoundation.org>, Danilo Krummrich <dakr@...nel.org>,
 linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-pm@...r.kernel.org
Subject: Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume
 and runtime disable

On 2025/12/1 17:46, YangYang wrote:
> On 2025/11/27 20:34, Rafael J. Wysocki wrote:
>> On Wed, Nov 26, 2025 at 11:47 PM Bart Van Assche <bvanassche@....org> wrote:
>>>
>>> On 11/26/25 1:30 PM, Rafael J. Wysocki wrote:
>>>> On Wed, Nov 26, 2025 at 10:11 PM Bart Van Assche <bvanassche@....org> wrote:
>>>>>
>>>>> On 11/26/25 12:17 PM, Rafael J. Wysocki wrote:
>>>>>> --- a/block/blk-core.c
>>>>>> +++ b/block/blk-core.c
>>>>>> @@ -309,6 +309,8 @@ int blk_queue_enter(struct request_queue
>>>>>>                 if (flags & BLK_MQ_REQ_NOWAIT)
>>>>>>                         return -EAGAIN;
>>>>>>
>>>>>> +             /* if necessary, resume .dev (assume success). */
>>>>>> +             blk_pm_resume_queue(pm, q);
>>>>>>                 /*
>>>>>>                  * read pair of barrier in blk_freeze_queue_start(), we need to
>>>>>>                  * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and
>>>>>
>>>>> blk_queue_enter() may be called from the suspend path so I don't think
>>>>> that the above change will work.
>>>>
>>>> Why would the existing code work then?
>>>
>>> The existing code works reliably on a very large number of devices.
>>
>> Well, except that it doesn't work during system suspend and
>> hibernation when the PM workqueue is frozen.  I think that we agree
>> here.
>>
>> This needs to be addressed because it may very well cause system
>> suspend to deadlock.
>>
>> There are two possible ways to address it I can think of:
>>
>> 1. Changing blk_pm_resume_queue() and its users to carry out a
>> synchronous resume of q->dev instead of calling pm_request_resume()
>> and (effectively) waiting for the queued-up runtime resume of q->dev
>> to take effect.
>>
>> This would be my preferred option, but at this point I'm not sure if
>> it's viable.
>>
> 
> After __pm_runtime_disable() is called from device_suspend_late(), dev->power.disable_depth is set, preventing 
> rpm_resume() from making progress until the system resume completes, regardless of whether rpm_resume() is invoked 
> synchronously or asynchronously.
> Performing a synchronous resume of q->dev seems to have a similar effect to removing the following code block from 
> __pm_runtime_barrier(), which is invoked by __pm_runtime_disable():
> 
> 1428     if (dev->power.request_pending) {
> 1429         dev->power.request = RPM_REQ_NONE;
> 1430         spin_unlock_irq(&dev->power.lock);
> 1431
> 1432         cancel_work_sync(&dev->power.work);
> 1433
> 1434         spin_lock_irq(&dev->power.lock);
> 1435         dev->power.request_pending = false;
> 1436     }
> 

Since both synchronous and asynchronous resumes face similar issues,
it may be sufficient to keep using the asynchronous resume path as long as
pending work items are not canceled while the PM workqueue is frozen.
This allows the pending work to proceed normally once the PM workqueue
is unfrozen.

---
  drivers/base/power/main.c    |  2 +-
  drivers/base/power/runtime.c | 17 +++++++++++------
  include/linux/pm_runtime.h   |  6 +++---
  3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 1de1cd72b616..d5c3d7a6777e 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1635,7 +1635,7 @@ static void device_suspend_late(struct device *dev, pm_message_t state, bool asy
  	 * Disable runtime PM for the device without checking if there is a
  	 * pending resume request for it.
  	 */
-	__pm_runtime_disable(dev, false);
+	__pm_runtime_disable(dev, false, true);

  	if (dev->power.syscore)
  		goto Skip;
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 1b11a3cd4acc..ff3fdfba2dc8 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -1421,11 +1421,16 @@ EXPORT_SYMBOL_GPL(__pm_runtime_set_status);
   *
   * Should be called under dev->power.lock with interrupts disabled.
   */
-static void __pm_runtime_barrier(struct device *dev)
+static void __pm_runtime_barrier(struct device *dev, bool frozen)
  {
  	pm_runtime_deactivate_timer(dev);

-	if (dev->power.request_pending) {
+	/*
+	 * If the PM workqueue has already been frozen, the following
+	 * operations are unnecessary. This allows any pending work to
+	 * continue execution once the PM workqueue is unfrozen.
+	 */
+	if (!frozen && dev->power.request_pending) {
  		dev->power.request = RPM_REQ_NONE;
  		spin_unlock_irq(&dev->power.lock);

@@ -1485,7 +1490,7 @@ int pm_runtime_barrier(struct device *dev)
  		retval = 1;
  	}

-	__pm_runtime_barrier(dev);
+	__pm_runtime_barrier(dev, false);

  	spin_unlock_irq(&dev->power.lock);
  	pm_runtime_put_noidle(dev);
@@ -1519,7 +1524,7 @@ void pm_runtime_unblock(struct device *dev)
  	spin_unlock_irq(&dev->power.lock);
  }

-void __pm_runtime_disable(struct device *dev, bool check_resume)
+void __pm_runtime_disable(struct device *dev, bool check_resume, bool frozen)
  {
  	spin_lock_irq(&dev->power.lock);

@@ -1550,7 +1555,7 @@ void __pm_runtime_disable(struct device *dev, bool check_resume)
  	update_pm_runtime_accounting(dev);

  	if (!dev->power.disable_depth++) {
-		__pm_runtime_barrier(dev);
+		__pm_runtime_barrier(dev, frozen);
  		dev->power.last_status = dev->power.runtime_status;
  	}

@@ -1893,7 +1898,7 @@ void pm_runtime_reinit(struct device *dev)
   */
  void pm_runtime_remove(struct device *dev)
  {
-	__pm_runtime_disable(dev, false);
+	__pm_runtime_disable(dev, false, false);
  	pm_runtime_reinit(dev);
  }

diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 0b436e15f4cd..102060a9ebc7 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -80,7 +80,7 @@ extern int pm_runtime_barrier(struct device *dev);
  extern bool pm_runtime_block_if_disabled(struct device *dev);
  extern void pm_runtime_unblock(struct device *dev);
  extern void pm_runtime_enable(struct device *dev);
-extern void __pm_runtime_disable(struct device *dev, bool check_resume);
+extern void __pm_runtime_disable(struct device *dev, bool check_resume, bool frozen);
  extern void pm_runtime_allow(struct device *dev);
  extern void pm_runtime_forbid(struct device *dev);
  extern void pm_runtime_no_callbacks(struct device *dev);
@@ -288,7 +288,7 @@ static inline int pm_runtime_barrier(struct device *dev) { return 0; }
  static inline bool pm_runtime_block_if_disabled(struct device *dev) { return true; }
  static inline void pm_runtime_unblock(struct device *dev) {}
  static inline void pm_runtime_enable(struct device *dev) {}
-static inline void __pm_runtime_disable(struct device *dev, bool c) {}
+static inline void __pm_runtime_disable(struct device *dev, bool c, bool f) {}
  static inline bool pm_runtime_blocked(struct device *dev) { return true; }
  static inline void pm_runtime_allow(struct device *dev) {}
  static inline void pm_runtime_forbid(struct device *dev) {}
@@ -775,7 +775,7 @@ static inline int pm_runtime_set_suspended(struct device *dev)
   */
  static inline void pm_runtime_disable(struct device *dev)
  {
-	__pm_runtime_disable(dev, true);
+	__pm_runtime_disable(dev, true, false);
  }

  /**
--