lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18d454ab-1a14-4e7c-90e7-831ec4d7ddae@vivo.com>
Date: Fri, 28 Nov 2025 15:20:16 +0800
From: YangYang <yang.yang@...o.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Bart Van Assche <bvanassche@....org>, Jens Axboe <axboe@...nel.dk>,
 Pavel Machek <pavel@...nel.org>, Len Brown <lenb@...nel.org>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Danilo Krummrich <dakr@...nel.org>, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: [PATCH 1/2] PM: runtime: Fix I/O hang due to race between resume
 and runtime disable

On 2025/11/27 20:44, Rafael J. Wysocki wrote:
> On Thu, Nov 27, 2025 at 12:29 PM YangYang <yang.yang@...o.com> wrote:
>>
>> On 2025/11/27 2:40, Bart Van Assche wrote:
>>> On 11/26/25 7:41 AM, Rafael J. Wysocki wrote:
>>>> As it stands, you have a basic problem with respect to system
>>>> suspend/hibernation.  As I said before, the PM workqueue is frozen
>>>> during system suspend/hibernation transitions, so waiting for an async
>>>> resume request to complete then is pointless.
>>>
>>> Agreed. I noticed that any attempt to call request_firmware() from
>>> driver system resume callback functions causes a deadlock if these
>>> calls happen before the block device has been resumed.
>>>
>>> Thanks,
>>>
>>> Bart.
>>
>> Does this patch look reasonable to you? It hasn't been fully tested
>> yet, but the resume is now performed synchronously.
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 66fb2071d..041d29ba4 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -323,12 +323,15 @@ int blk_queue_enter(struct request_queue *q,
>> blk_mq_req_flags_t flags)
>>                    * reordered.
>>                    */
>>                   smp_rmb();
>> -               wait_event(q->mq_freeze_wq,
>> -                          (!q->mq_freeze_depth &&
>> -                           blk_pm_resume_queue(pm, q)) ||
>> -                          blk_queue_dying(q));
>> +check:
>> +               wait_event(q->mq_freeze_wq, !q->mq_freeze_depth);
> 
> I think that you still need to check blk_queue_dying(q) under
> wait_even() or you may not stop waiting when this happens.
> 

Got it.

>> +
>>                   if (blk_queue_dying(q))
>>                           return -ENODEV;
>> +               if (!blk_pm_resume_queue(pm, q)) {
>> +                       pm_runtime_resume(q->dev);
>> +                       goto check;
>> +               }
>>           }
>>
>>           rwsem_acquire_read(&q->q_lockdep_map, 0, 0, _RET_IP_);
>> @@ -356,12 +359,15 @@ int __bio_queue_enter(struct request_queue *q,
>> struct bio *bio)
>>                    * reordered.
>>                    */
>>                   smp_rmb();
>> -               wait_event(q->mq_freeze_wq,
>> -                          (!q->mq_freeze_depth &&
>> -                           blk_pm_resume_queue(false, q)) ||
>> -                          test_bit(GD_DEAD, &disk->state));
>> +check:
>> +               wait_event(q->mq_freeze_wq, !q->mq_freeze_depth);
> 
> Analogously here, you may not stop waiting when test_bit(GD_DEAD,
> &disk->state) is true.
> 

Got it.

>> +
>>                   if (test_bit(GD_DEAD, &disk->state))
>>                           goto dead;
>> +               if (!blk_pm_resume_queue(false, q)) {
>> +                       pm_runtime_resume(q->dev);
>> +                       goto check;
>> +               }
>>           }
>>
>>           rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_);
>> diff --git a/block/blk-pm.h b/block/blk-pm.h
>> index 8a5a0d4b3..c28fad105 100644
>> --- a/block/blk-pm.h
>> +++ b/block/blk-pm.h
>> @@ -12,7 +12,6 @@ static inline int blk_pm_resume_queue(const bool pm,
>> struct request_queue *q)
>>                   return 1;       /* Nothing to do */
>>           if (pm && q->rpm_status != RPM_SUSPENDED)
>>                   return 1;       /* Request allowed */
>> -       pm_request_resume(q->dev);
>>           return 0;
>>    }
> 
> And I would rename blk_pm_resume_queue() to something like
> blk_pm_queue_active() because it is a bit confusing as it stands.
> 
> Apart from the above remarks this makes sense to me FWIW.

Got it. I'll fix these in the next version and run some tests before 
sending it out. Thanks for the review.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ