lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2e2da599-b67d-251a-e0d1-348e9208675b@huaweicloud.com>
Date: Fri, 15 Aug 2025 09:54:22 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Nilay Shroff <nilay@...ux.ibm.com>, Yu Kuai <yukuai1@...weicloud.com>,
 axboe@...nel.dk, bvanassche@....org, hare@...e.de, ming.lei@...hat.com
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 yi.zhang@...wei.com, yangerkun@...wei.com, johnny.chenyi@...wei.com,
 "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH 08/16] blk-mq: fix blk_mq_tags double free while
 nr_requests grown

Hi,

在 2025/08/14 20:15, Nilay Shroff 写道:
> 
> 
> On 8/14/25 9:05 AM, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@...wei.com>
>>
>> In the case user trigger tags grow by queue sysfs attribute nr_requests,
>> hctx->sched_tags will be freed directly and replaced with a new
>> allocated tags, see blk_mq_tag_update_depth().
>>
>> The problem is that hctx->sched_tags is from elevator->et->tags, while
>> et->tags is still the freed tags, hence later elevator exist will try to
>> free the tags again, causing kernel panic.
>>
>> Fix this problem by using new halper blk_mq_alloc_sched_tags() to
>> allocate a new sched_tags. Meanwhile, there is a longterm problem can be
>> fixed as well:
>>
>> If blk_mq_tag_update_depth() succeed for previous hctx, then bitmap depth
>> is updated, however, if following hctx failed, q->nr_requests is not
>> updated and the previous hctx->sched_tags endup bigger than q->nr_requests.
>>
>> Fixes: f5a6604f7a44 ("block: fix lockdep warning caused by lock dependency in elv_iosched_store")
>> Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")
>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>> ---
>>   block/blk-mq.c | 31 ++++++++++++++++++++-----------
>>   1 file changed, 20 insertions(+), 11 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index a7d6a20c1524..f1c11f591c27 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -4917,6 +4917,23 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
>>   }
>>   EXPORT_SYMBOL(blk_mq_free_tag_set);
>>   
>> +static int blk_mq_sched_grow_tags(struct request_queue *q, unsigned int nr)
>> +{
>> +	struct elevator_tags *et =
>> +		blk_mq_alloc_sched_tags(q->tag_set, q->nr_hw_queues, nr);
>> +	struct blk_mq_hw_ctx *hctx;
>> +	unsigned long i;
>> +
>> +	if (!et)
>> +		return -ENOMEM;
>> +
>> +	blk_mq_free_sched_tags(q->elevator->et, q->tag_set);
>> +	queue_for_each_hw_ctx(q, hctx, i)
>> +		hctx->sched_tags = et->tags[i];
>> +	q->elevator->et = et;
>> +	return 0;
>> +}
>> +
> I see that we're allocating/freeing sched tags here after we freeze the
> queue and acquire ->elevator_lock. And so this shall cause the lockdep
> splat we saw earlier[1]. I'm not saying that your proposed change would
> cause it, but I think this is one of the code path which we missed to
> handle. So when you're at it, please ensure that we don't get into this
> splat again. We've already fixed this splat from another code paths
> (elevator change and nr_hw_queue update) but it seems we also need to
> handle that case here as well.

Thanks for pointing this, I was thinking the gfp flags inside
blk_mq_alloc_sched_tags will be enough to prevent deadlock related to 
fs_reclaim. I still need to review a lot to catch up.

Looks I need to allocate memory before freezing the queue from
queue_requests_store, I think this is fine with prevrious cleanups and
we'll know excatly if memory allocation is necessary.

BTW, just curious, do we plan or have documents anywhere to explain the
lock ordering? It will be much easier for people like me to add new code
around.

Thanks
Kuai

> 
> [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@linux.ibm.com/
> 
> Thanks,
> --Nilay
> .
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ