[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250903084135.2860-1-xue01.he@samsung.com>
Date: Wed, 3 Sep 2025 08:41:35 +0000
From: Xue He <xue01.he@...sung.com>
To: yukuai1@...weicloud.com, axboe@...nel.dk
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
yukuai3@...wei.com
Subject: Re: [PATCH] block: plug attempts to batch allocate tags multiple
times
On 2025/09/02 08:47 AM, Yu Kuai wrote:
>On 2025/09/01 16:22, Xue He wrote:
......
>> This patch aims to allow the remaining I/O operations to retry batch
>> allocation of tags, reducing the overhead caused by multiple
>> individual tag allocations.
>>
>> ------------------------------------------------------------------------
>> test result
>> During testing of the PCIe Gen4 SSD Samsung PM9A3, the perf tool
>> observed CPU improvements. The CPU usage of the original function
>> _blk_mq_alloc_requests function was 1.39%, which decreased to 0.82%
>> after modification.
>>
>> Additionally, performance variations were observed on different devices.
>> workload:randread
>> blocksize:4k
>> thread:1
>> ------------------------------------------------------------------------
>> PCIe Gen3 SSD PCIe Gen4 SSD PCIe Gen5 SSD
>> native kernel 553k iops 633k iops 793k iops
>> modified 553k iops 635k iops 801k iops
>>
>> with Optane SSDs, the performance like
>> two device one thread
>> cmd :sudo taskset -c 0 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1
>> -n1 -r4 /dev/nvme0n1 /dev/nvme1n1
>>
>
>How many hw_queues and how many tags in each hw_queues in your nvme?
>I feel it's unlikely that tags can be exhausted, usually cpu will become
>bottleneck first.
the information of my nvme like this:
number of CPU: 16
memory: 16G
nvme nvme0: 16/0/16 default/read/poll queue
cat /sys/class/nvme/nvme0/nvme0n1/queue/nr_requests
1023
In more precise terms, I think it is not that the tags are fully exhausted,
but rather that after scanning the bitmap for free bits, the remaining
contiguous bits are nsufficient to meet the requirement (have but not enough).
The specific function involved is __sbitmap_queue_get_batch in lib/sbitmap.c.
get_mask = ((1UL << nr_tags) - 1) << nr;
if (nr_tags > 1) {
printk("before %ld\n", get_mask);
}
while (!atomic_long_try_cmpxchg(ptr, &val,
get_mask | val))
;
get_mask = (get_mask & ~val) >> nr;
where during the batch acquisition of contiguous free bits, an atomic operation
is performed, resulting in the actual tag_mask obtained differing from the
originally requested one.
Am I missing something?
>> base: 6.4 Million IOPS
>> patch: 6.49 Million IOPS
>>
>> two device two thread
>> cmd: sudo taskset -c 0 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1
>> -n1 -r4 /dev/nvme0n1 /dev/nvme1n1
>>
>> base: 7.34 Million IOPS
>> patch: 7.48 Million IOPS
>> -------------------------------------------------------------------------
>>
>> Signed-off-by: hexue <xue01.he@...sung.com>
>> ---
>> block/blk-mq.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index b67d6c02eceb..1fb280764b76 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -587,9 +587,9 @@ static struct request *blk_mq_rq_cache_fill(struct request_queue *q,
>> if (blk_queue_enter(q, flags))
>> return NULL;
>>
>> - plug->nr_ios = 1;
>> -
>> rq = __blk_mq_alloc_requests(&data);
>> + plug->nr_ios = data.nr_tags;
>> +
>> if (unlikely(!rq))
>> blk_queue_exit(q);
>> return rq;
>> @@ -3034,11 +3034,13 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
>>
>> if (plug) {
>> data.nr_tags = plug->nr_ios;
>> - plug->nr_ios = 1;
>> data.cached_rqs = &plug->cached_rqs;
>> }
>>
>> rq = __blk_mq_alloc_requests(&data);
>> + if (plug)
>> + plug->nr_ios = data.nr_tags;
>> +
>> if (unlikely(!rq))
>> rq_qos_cleanup(q, bio);
>> return rq;
>>
>
>In __blk_mq_alloc_requests(), if __blk_mq_alloc_requests_batch() failed,
>data->nr_tags is set to 1, so plug->nr_ios = data.nr_tags will still set
>plug->nr_ios to 1 in this case.
>
>What am I missing?
yes, you are right, if __blk_mq_alloc_requests_batch() failed, it will set
to 1. However, in this case, it did not fail to execute; instead, the
allocated number of tags was insufficient, as only a partial number were
allocated. Therefore, the function is considered successfully executed.
>Thanks,
>Kuai
>
Thanks,
Xue
Powered by blists - more mailing lists