[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40a0bbd6-10c7-45bd-9129-51c1ea99a063@kernel.dk>
Date: Wed, 16 Apr 2025 12:25:49 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Pavel Begunkov <asml.silence@...il.com>,
Nitesh Shetty <nj.shetty@...sung.com>
Cc: gost.dev@...sung.com, nitheshshetty@...il.com, io-uring@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] io_uring/rsrc: send exact nr_segs for fixed buffer
On 4/16/25 9:07 AM, Jens Axboe wrote:
> On 4/16/25 9:03 AM, Pavel Begunkov wrote:
>> On 4/16/25 06:44, Nitesh Shetty wrote:
>>> Sending exact nr_segs, avoids bio split check and processing in
>>> block layer, which takes around 5%[1] of overall CPU utilization.
>>>
>>> In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2]
>>> and 5% less CPU utilization.
>>>
>>> [1]
>>> 3.52% io_uring [kernel.kallsyms] [k] bio_split_rw_at
>>> 1.42% io_uring [kernel.kallsyms] [k] bio_split_rw
>>> 0.62% io_uring [kernel.kallsyms] [k] bio_submit_split
>>>
>>> [2]
>>> sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2
>>> -r4 /dev/nvme0n1 /dev/nvme1n1
>>>
>>> Signed-off-by: Nitesh Shetty <nj.shetty@...sung.com>
>>> ---
>>> io_uring/rsrc.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>>> index b36c8825550e..6fd3a4a85a9c 100644
>>> --- a/io_uring/rsrc.c
>>> +++ b/io_uring/rsrc.c
>>> @@ -1096,6 +1096,9 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>>> iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1);
>>> }
>>> }
>>> + iter->nr_segs = (iter->bvec->bv_offset + iter->iov_offset +
>>> + iter->count + ((1UL << imu->folio_shift) - 1)) /
>>> + (1UL << imu->folio_shift);
>>
>> That's not going to work with ->is_kbuf as the segments are not uniform in
>> size.
>
> Oops yes good point.
How about something like this? Trims superflous end segments, if they
exist. The 'offset' section already trimmed the front parts. For
!is_kbuf that should be simple math, like in Nitesh's patch. For
is_kbuf, iterate them.
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index bef66e733a77..e482ea1e22a9 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1036,6 +1036,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
struct io_mapped_ubuf *imu,
u64 buf_addr, size_t len)
{
+ const struct bio_vec *bvec;
unsigned int folio_shift;
size_t offset;
int ret;
@@ -1052,9 +1053,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
* Might not be a start of buffer, set size appropriately
* and advance us to the beginning.
*/
+ bvec = imu->bvec;
offset = buf_addr - imu->ubuf;
folio_shift = imu->folio_shift;
- iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len);
+ iov_iter_bvec(iter, ddir, bvec, imu->nr_bvecs, offset + len);
if (offset) {
/*
@@ -1073,7 +1075,6 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
* since we can just skip the first segment, which may not
* be folio_size aligned.
*/
- const struct bio_vec *bvec = imu->bvec;
/*
* Kernel buffer bvecs, on the other hand, don't necessarily
@@ -1099,6 +1100,27 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
}
}
+ /*
+ * Offset trimmed front segments too, if any, now trim the tail.
+ * For is_kbuf we'll iterate them as they may be different sizes,
+ * otherwise we can just do straight up math.
+ */
+ if (len + offset < imu->len) {
+ bvec = iter->bvec;
+ if (imu->is_kbuf) {
+ while (len > bvec->bv_len) {
+ len -= bvec->bv_len;
+ bvec++;
+ }
+ iter->nr_segs = bvec - iter->bvec;
+ } else {
+ size_t vec_len;
+
+ vec_len = bvec->bv_offset + iter->iov_offset +
+ iter->count + ((1UL << folio_shift) - 1);
+ iter->nr_segs = vec_len >> folio_shift;
+ }
+ }
return 0;
}
--
Jens Axboe
Powered by blists - more mailing lists