lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 15 Apr 2019 11:46:04 +0200
From:   Roman Penyaev <>
To:     Bart Van Assche <>
Cc:     Bob Liu <>,,,,
        Roman Pen <>,
        Akinobu Mita <>,
        Tejun Heo <>, Jens Axboe <>,
        Christoph Hellwig <>,,
Subject: Re: [RESEND PATCH] blk-mq: fix hang caused by freeze/unfreeze

On 2019-04-13 05:42, Bart Van Assche wrote:
> On 4/9/19 2:08 AM, Bob Liu wrote:
>>  void blk_freeze_queue_start(struct request_queue *q)
>>  {
>> -	int freeze_depth;
>> -
>> -	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
>> -	if (freeze_depth == 1) {
>> +	mutex_lock(&q->mq_freeze_lock);
>> +	if (++q->mq_freeze_depth == 1) {
>>  		percpu_ref_kill(&q->q_usage_counter);
>> +		mutex_unlock(&q->mq_freeze_lock);
>>  		if (queue_is_mq(q))
>>  			blk_mq_run_hw_queues(q, false);
>> +	} else {
>> +		mutex_unlock(&q->mq_freeze_lock);
>>  	}
>>  }
> Have you considered to move the mutex_unlock() call to the end of the 
> function
> such that there is only one mutex_unlock() call instead of two? In case 
> you
> would be worried about holding the mutex around the code that runs the 
> queue,
> how about changing the blk_mq_run_hw_queues() call such that the queues 
> are
> run async?

Hi Bart,

The only purpose of 'mq_freeze_lock' is to avoid race between 
variable and the following usage of q_usage_counter percpu ref.  I admit 
my original comment is quite unclear, but locked section should be as 
as possible, so returning to your question: better to have two unlock 
instead of expanding locked critical section.

Unfortunately I do not have hardware to play again with the issue, but I 
there is a nice candidate for a quick reproduction:  null_blk queues 
shared tags.  Having several queues with shared tags and a script, which
powers on/off (I mean 'power' entry of configfs of the null_blk) 
null devices from different cpus it is quite possible to trigger the 
Random short msdelay() in correct places can help to increase 
probability to
hit the issue quite fast.

But Bob, what is the backtrace of the issue you hit?  What is the 
Conditions to reproduce the issue are quite specific and frankly I did 
find any "naked" (without any locks) calls of blk_mq_freeze/unfreeze 
the only candidate which I found, seems, null_blk (not 100% sure, but 
worth to


Powered by blists - more mailing lists