linux-kernel - Re: [PATCH -next v4] blk-mq: fix tag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d7f51067-f5a8-e78c-5ece-c1ef132b9b9a@huawei.com>
Date:   Wed, 12 Jan 2022 12:18:53 +0800
From:   QiuLaibin <qiulaibin@...wei.com>
To:     Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
CC:     <axboe@...nel.dk>, <ming.lei@...hat.com>, <john.garry@...wei.com>,
        <martin.petersen@...cle.com>, <hare@...e.de>,
        <johannes.thumshirn@....com>, <bvanassche@....org>,
        <linux-block@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -next v4] blk-mq: fix tag_get wait task can't be awakened



On 2022/1/11 22:15, Andy Shevchenko wrote:
> On Tue, Jan 11, 2022 at 10:02:16PM +0800, Laibin Qiu wrote:
>> In case of shared tags, there might be more than one hctx which
>> allocates from the same tags, and each hctx is limited to allocate at
>> most:
>>          hctx_max_depth = max((bt->sb.depth + users - 1) / users, 4U);
>>
>> tag idle detection is lazy, and may be delayed for 30sec, so there
>> could be just one real active hctx(queue) but all others are actually
>> idle and still accounted as active because of the lazy idle detection.
>> Then if wake_batch is > hctx_max_depth, driver tag allocation may wait
>> forever on this real active hctx.
>>
>> Fix this by recalculating wake_batch when inc or dec active_queues.
> 
> ...
> 
>>   {
>> +	unsigned int users;
> 
> Missed blank line here.
Thanks, i will modify it in V5.
> 
>>   	if (blk_mq_is_shared_tags(hctx->flags)) {
>>   		struct request_queue *q = hctx->queue;
>>   
>> +		if (test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) ||
>> +		    test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) {
> 
> Whoever wrote this code did too much defensive programming, because the first
> conditional doesn't make much sense here. Am I right?
> 
I think because this judgement is in the general IO process, there are 
also some performance considerations here.
>> +			return true;
>> +		}
>>   	} else {
> 
>> +		if (test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) ||
>> +		    test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) {
> 
> Ditto.
> 
>> +			return true;
>> +		}
>>   	}
> 
> ...
> 
>> +	unsigned int wake_batch = clamp_t(unsigned int,
>> +			(sbq->sb.depth + users - 1) / users, 4U, SBQ_WAKE_BATCH);
> 
> 
> 	unsigned int wake_batch;
> 
> 	wake_batch = clamp_val((sbq->sb.depth + users - 1) / users, 4, SBQ_WAKE_BATCH);
> 	...
> 
> is easier to read, no?

Here I refer to the calculation method in sbq_calc_wake_batch(). And I 
will separate the definition from the calculation in V5.

>