[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <345e9d6e-8bb2-3d43-4c3c-cc16fa7dd8c1@huaweicloud.com>
Date: Tue, 2 Sep 2025 16:47:25 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Xue He <xue01.he@...sung.com>, axboe@...nel.dk
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] block: plug attempts to batch allocate tags multiple
times
Hi,
在 2025/09/01 16:22, Xue He 写道:
> From: hexue <xue01.he@...sung.com>
>
> In the existing plug mechanism, tags are allocated in batches based on
> the number of requests. However, testing has shown that the plug only
> attempts batch allocation of tags once at the beginning of a batch of
> I/O operations. Since the tag_mask does not always have enough available
> tags to satisfy the requested number, a full batch allocation is not
> guaranteed to succeed each time. The remaining tags are then allocated
> individually (occurs frequently), leading to multiple single-tag
> allocation overheads.
>
> This patch aims to allow the remaining I/O operations to retry batch
> allocation of tags, reducing the overhead caused by multiple
> individual tag allocations.
>
> ------------------------------------------------------------------------
> test result
> During testing of the PCIe Gen4 SSD Samsung PM9A3, the perf tool
> observed CPU improvements. The CPU usage of the original function
> _blk_mq_alloc_requests function was 1.39%, which decreased to 0.82%
> after modification.
>
> Additionally, performance variations were observed on different devices.
> workload:randread
> blocksize:4k
> thread:1
> ------------------------------------------------------------------------
> PCIe Gen3 SSD PCIe Gen4 SSD PCIe Gen5 SSD
> native kernel 553k iops 633k iops 793k iops
> modified 553k iops 635k iops 801k iops
>
> with Optane SSDs, the performance like
> two device one thread
> cmd :sudo taskset -c 0 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1
> -n1 -r4 /dev/nvme0n1 /dev/nvme1n1
>
How many hw_queues and how many tags in each hw_queues in your nvme?
I feel it's unlikely that tags can be exhausted, usually cpu will become
bottleneck first.
> base: 6.4 Million IOPS
> patch: 6.49 Million IOPS
>
> two device two thread
> cmd: sudo taskset -c 0 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1
> -n1 -r4 /dev/nvme0n1 /dev/nvme1n1
>
> base: 7.34 Million IOPS
> patch: 7.48 Million IOPS
> -------------------------------------------------------------------------
>
> Signed-off-by: hexue <xue01.he@...sung.com>
> ---
> block/blk-mq.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index b67d6c02eceb..1fb280764b76 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -587,9 +587,9 @@ static struct request *blk_mq_rq_cache_fill(struct request_queue *q,
> if (blk_queue_enter(q, flags))
> return NULL;
>
> - plug->nr_ios = 1;
> -
> rq = __blk_mq_alloc_requests(&data);
> + plug->nr_ios = data.nr_tags;
> +
> if (unlikely(!rq))
> blk_queue_exit(q);
> return rq;
> @@ -3034,11 +3034,13 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
>
> if (plug) {
> data.nr_tags = plug->nr_ios;
> - plug->nr_ios = 1;
> data.cached_rqs = &plug->cached_rqs;
> }
>
> rq = __blk_mq_alloc_requests(&data);
> + if (plug)
> + plug->nr_ios = data.nr_tags;
> +
> if (unlikely(!rq))
> rq_qos_cleanup(q, bio);
> return rq;
>
In __blk_mq_alloc_requests(), if __blk_mq_alloc_requests_batch() failed,
data->nr_tags is set to 1, so plug->nr_ios = data.nr_tags will still set
plug->nr_ios to 1 in this case.
What am I missing?
Thanks,
Kuai
Powered by blists - more mailing lists