[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <51ad6831-31c9-945b-dc7a-136013f79c14@huaweicloud.com>
Date: Mon, 15 Sep 2025 09:22:43 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Xue He <xue01.he@...sung.com>, yukuai1@...weicloud.com
Cc: axboe@...nel.dk, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] block: plug attempts to batch allocate tags multiple
times
Hi,
在 2025/09/12 11:06, Xue He 写道:
> On 2025/09/03 18:35 PM, Yu Kuai wrote:
>> On 2025/09/03 16:41 PM, Xue He wrote:
>>> On 2025/09/02 08:47 AM, Yu Kuai wrote:
>>>> On 2025/09/01 16:22, Xue He wrote:
>>> ......
>>
>>
>> Yes, so this function will likely to obtain less tags than nr_tags,the
>> mask is always start from first zero bit with nr_tags bit, and
>> sbitmap_deferred_clear() is called uncondionally, it's likely there are
>> non-zero bits within this range.
>>
>> Just wonder, do you consider fixing this directly in
>> __blk_mq_alloc_requests_batch()?
>>
>> - call sbitmap_deferred_clear() and retry on allocation failure, so
>> that the whole word can be used even if previous allocated request are
>> done, especially for nvme with huge tag depths;
>> - retry blk_mq_get_tags() until data->nr_tags is zero;
>
> Hi, Yu Kuai, I'm not entirely sure if I understand correctly, but during
> each tag allocation, sbitmap_deferred_clear() is typically called first,
> as seen in the __sbitmap_queue_get_batch() function.
>
> for (i = 0; i < sb->map_nr; i++) {
> struct sbitmap_word *map = &sb->map[index];
> unsigned long get_mask;
> unsigned int map_depth = __map_depth(sb, index);
> unsigned long val;
>
> sbitmap_deferred_clear(map, 0, 0, 0);
Yes, this is called first each time, and I don't feel this is good for
performance anyway. I think it can be dealyed after a try first, like
sbitmap_find_bit_in_word().
> ------------------------------------------------------------------------
> so I try to recall blk_mq_get_tags() until data->nr_tags is zero, like:
>
> - int i, nr = 0;
>
> - tag_mask = blk_mq_get_tags(data, data->nr_tags, &tag_offset);
> - if (unlikely(!tag_mask))
> - return NULL;
> -
> - tags = blk_mq_tags_from_data(data);
> - for (i = 0; tag_mask; i++) {
> - if (!(tag_mask & (1UL << i)))
> - continue;
> - tag = tag_offset + i;
> - prefetch(tags->static_rqs[tag]);
> - tag_mask &= ~(1UL << i);
> - rq = blk_mq_rq_ctx_init(data, tags, tag);
> - rq_list_add_head(data->cached_rqs, rq);
> - nr++;
> - }
> - if (!(data->rq_flags & RQF_SCHED_TAGS))
> - blk_mq_add_active_requests(data->hctx, nr);
> - /* caller already holds a reference, add for remainder */
> - percpu_ref_get_many(&data->q->q_usage_counter, nr - 1);
> - data->nr_tags -= nr;
> + do {
> + int i, nr = 0;
> + tag_mask = blk_mq_get_tags(data, data->nr_tags, &tag_offset);
> + if (unlikely(!tag_mask))
> + return NULL;
> + tags = blk_mq_tags_from_data(data);
> + for (i = 0; tag_mask; i++) {
> + if (!(tag_mask & (1UL << i)))
> + continue;
> + tag = tag_offset + i;
> + prefetch(tags->static_rqs[tag]);
> + tag_mask &= ~(1UL << i);
> + rq = blk_mq_rq_ctx_init(data, tags, tag);
> + rq_list_add_head(data->cached_rqs, rq);
> + nr++;
> + }
> + if (!(data->rq_flags & RQF_SCHED_TAGS))
> + blk_mq_add_active_requests(data->hctx, nr);
> + /* caller already holds a reference, add for remainder */
> + percpu_ref_get_many(&data->q->q_usage_counter, nr - 1);
> + data->nr_tags -= nr;
> + } while (data->nr_tags);
>
> I added a loop structure, it also achieve a good results like before,
> but I have a question: although the loop will retry tag allocation
> when the required number of tags is not met, there is a risk of an
> infinite loop, right? However, I couldn't think of a safer condition
> to terminate the loop. Do you have any suggestions?
Yes, this is what I have in mind. Why do you think there can be infinite
loop? We should allcocate at least one tag by blk_mq_get_tags() in each
loop, or return directly.
Thanks,
Kuai
>
> Thanks,
> Xue
>
> .
>
Powered by blists - more mailing lists