linux-kernel - Re: [PATCH] block: plug attempts to batch allocate tags multiple times

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <51ad6831-31c9-945b-dc7a-136013f79c14@huaweicloud.com>
Date: Mon, 15 Sep 2025 09:22:43 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Xue He <xue01.he@...sung.com>, yukuai1@...weicloud.com
Cc: axboe@...nel.dk, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] block: plug attempts to batch allocate tags multiple
 times

Hi,

在 2025/09/12 11:06, Xue He 写道:
> On 2025/09/03 18:35 PM, Yu Kuai wrote:
>> On 2025/09/03 16:41 PM, Xue He wrote:
>>> On 2025/09/02 08:47 AM, Yu Kuai wrote:
>>>> On 2025/09/01 16:22, Xue He wrote:
>>> ......
>>
>>
>> Yes, so this function will likely to obtain less tags than nr_tags,the
>> mask is always start from first zero bit with nr_tags bit, and
>> sbitmap_deferred_clear() is called uncondionally, it's likely there are
>> non-zero bits within this range.
>>
>> Just wonder, do you consider fixing this directly in
>> __blk_mq_alloc_requests_batch()?
>>
>>   - call sbitmap_deferred_clear() and retry on allocation failure, so
>> that the whole word can be used even if previous allocated request are
>> done, especially for nvme with huge tag depths;
>>   - retry blk_mq_get_tags() until data->nr_tags is zero;
> 
> Hi, Yu Kuai, I'm not entirely sure if I understand correctly, but during
> each tag allocation, sbitmap_deferred_clear() is typically called first,
> as seen in the __sbitmap_queue_get_batch() function.
> 
>          for (i = 0; i < sb->map_nr; i++) {
>                  struct sbitmap_word *map = &sb->map[index];
>                  unsigned long get_mask;
>                  unsigned int map_depth = __map_depth(sb, index);
>                  unsigned long val;
> 
>                  sbitmap_deferred_clear(map, 0, 0, 0);

Yes, this is called first each time, and I don't feel this is good for
performance anyway. I think it can be dealyed after a try first, like
sbitmap_find_bit_in_word().

> ------------------------------------------------------------------------
> so I try to recall blk_mq_get_tags() until data->nr_tags is zero, like:
> 
> -       int i, nr = 0;
> 
> -       tag_mask = blk_mq_get_tags(data, data->nr_tags, &tag_offset);
> -       if (unlikely(!tag_mask))
> -               return NULL;
> -
> -       tags = blk_mq_tags_from_data(data);
> -       for (i = 0; tag_mask; i++) {
> -               if (!(tag_mask & (1UL << i)))
> -                       continue;
> -               tag = tag_offset + i;
> -               prefetch(tags->static_rqs[tag]);
> -               tag_mask &= ~(1UL << i);
> -               rq = blk_mq_rq_ctx_init(data, tags, tag);
> -               rq_list_add_head(data->cached_rqs, rq);
> -               nr++;
> -       }
> -       if (!(data->rq_flags & RQF_SCHED_TAGS))
> -               blk_mq_add_active_requests(data->hctx, nr);
> -       /* caller already holds a reference, add for remainder */
> -       percpu_ref_get_many(&data->q->q_usage_counter, nr - 1);
> -       data->nr_tags -= nr;
> +       do {
> +               int i, nr = 0;
> +               tag_mask = blk_mq_get_tags(data, data->nr_tags, &tag_offset);
> +               if (unlikely(!tag_mask))
> +                       return NULL;
> +               tags = blk_mq_tags_from_data(data);
> +               for (i = 0; tag_mask; i++) {
> +                       if (!(tag_mask & (1UL << i)))
> +                               continue;
> +                       tag = tag_offset + i;
> +                       prefetch(tags->static_rqs[tag]);
> +                       tag_mask &= ~(1UL << i);
> +                       rq = blk_mq_rq_ctx_init(data, tags, tag);
> +                       rq_list_add_head(data->cached_rqs, rq);
> +                       nr++;
> +               }
> +               if (!(data->rq_flags & RQF_SCHED_TAGS))
> +                       blk_mq_add_active_requests(data->hctx, nr);
> +               /* caller already holds a reference, add for remainder */
> +               percpu_ref_get_many(&data->q->q_usage_counter, nr - 1);
> +               data->nr_tags -= nr;
> +       } while (data->nr_tags);
> 
> I added a loop structure, it also achieve a good results like before,
> but I have a question: although the loop will retry tag allocation
> when the required number of tags is not met, there is a risk of an
> infinite loop, right? However, I couldn't think of a safer condition
> to terminate the loop. Do you have any suggestions?

Yes, this is what I have in mind. Why do you think there can be infinite
loop? We should allcocate at least one tag by blk_mq_get_tags() in each
loop, or return directly.

Thanks,
Kuai

> 
> Thanks,
> Xue
> 
> .
>