[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20250912030601.236426-1-xue01.he@samsung.com>
Date: Fri, 12 Sep 2025 03:06:01 +0000
From: Xue He <xue01.he@...sung.com>
To: yukuai1@...weicloud.com
Cc: axboe@...nel.dk, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, xue01.he@...sung.com, yukuai3@...wei.com
Subject: Re: [PATCH] block: plug attempts to batch allocate tags multiple
times
On 2025/09/03 18:35 PM, Yu Kuai wrote:
>On 2025/09/03 16:41 PM, Xue He wrote:
>> On 2025/09/02 08:47 AM, Yu Kuai wrote:
>>> On 2025/09/01 16:22, Xue He wrote:
>> ......
>
>
>Yes, so this function will likely to obtain less tags than nr_tags,the
>mask is always start from first zero bit with nr_tags bit, and
>sbitmap_deferred_clear() is called uncondionally, it's likely there are
>non-zero bits within this range.
>
>Just wonder, do you consider fixing this directly in
>__blk_mq_alloc_requests_batch()?
>
> - call sbitmap_deferred_clear() and retry on allocation failure, so
>that the whole word can be used even if previous allocated request are
>done, especially for nvme with huge tag depths;
> - retry blk_mq_get_tags() until data->nr_tags is zero;
Hi, Yu Kuai, I'm not entirely sure if I understand correctly, but during
each tag allocation, sbitmap_deferred_clear() is typically called first,
as seen in the __sbitmap_queue_get_batch() function.
for (i = 0; i < sb->map_nr; i++) {
struct sbitmap_word *map = &sb->map[index];
unsigned long get_mask;
unsigned int map_depth = __map_depth(sb, index);
unsigned long val;
sbitmap_deferred_clear(map, 0, 0, 0);
------------------------------------------------------------------------
so I try to recall blk_mq_get_tags() until data->nr_tags is zero, like:
- int i, nr = 0;
- tag_mask = blk_mq_get_tags(data, data->nr_tags, &tag_offset);
- if (unlikely(!tag_mask))
- return NULL;
-
- tags = blk_mq_tags_from_data(data);
- for (i = 0; tag_mask; i++) {
- if (!(tag_mask & (1UL << i)))
- continue;
- tag = tag_offset + i;
- prefetch(tags->static_rqs[tag]);
- tag_mask &= ~(1UL << i);
- rq = blk_mq_rq_ctx_init(data, tags, tag);
- rq_list_add_head(data->cached_rqs, rq);
- nr++;
- }
- if (!(data->rq_flags & RQF_SCHED_TAGS))
- blk_mq_add_active_requests(data->hctx, nr);
- /* caller already holds a reference, add for remainder */
- percpu_ref_get_many(&data->q->q_usage_counter, nr - 1);
- data->nr_tags -= nr;
+ do {
+ int i, nr = 0;
+ tag_mask = blk_mq_get_tags(data, data->nr_tags, &tag_offset);
+ if (unlikely(!tag_mask))
+ return NULL;
+ tags = blk_mq_tags_from_data(data);
+ for (i = 0; tag_mask; i++) {
+ if (!(tag_mask & (1UL << i)))
+ continue;
+ tag = tag_offset + i;
+ prefetch(tags->static_rqs[tag]);
+ tag_mask &= ~(1UL << i);
+ rq = blk_mq_rq_ctx_init(data, tags, tag);
+ rq_list_add_head(data->cached_rqs, rq);
+ nr++;
+ }
+ if (!(data->rq_flags & RQF_SCHED_TAGS))
+ blk_mq_add_active_requests(data->hctx, nr);
+ /* caller already holds a reference, add for remainder */
+ percpu_ref_get_many(&data->q->q_usage_counter, nr - 1);
+ data->nr_tags -= nr;
+ } while (data->nr_tags);
I added a loop structure, it also achieve a good results like before,
but I have a question: although the loop will retry tag allocation
when the required number of tags is not met, there is a risk of an
infinite loop, right? However, I couldn't think of a safer condition
to terminate the loop. Do you have any suggestions?
Thanks,
Xue
Powered by blists - more mailing lists