[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c9838a68-7443-40d8-a1b7-492a12e6f9dc@kernel.dk>
Date: Wed, 16 Apr 2025 14:01:47 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Nitesh Shetty <nitheshshetty@...il.com>
Cc: Pavel Begunkov <asml.silence@...il.com>,
Nitesh Shetty <nj.shetty@...sung.com>, gost.dev@...sung.com,
io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] io_uring/rsrc: send exact nr_segs for fixed buffer
On 4/16/25 1:57 PM, Nitesh Shetty wrote:
> On Wed, Apr 16, 2025 at 11:55?PM Jens Axboe <axboe@...nel.dk> wrote:
>>
>> On 4/16/25 9:07 AM, Jens Axboe wrote:
>>> On 4/16/25 9:03 AM, Pavel Begunkov wrote:
>>>> On 4/16/25 06:44, Nitesh Shetty wrote:
>>>>> Sending exact nr_segs, avoids bio split check and processing in
>>>>> block layer, which takes around 5%[1] of overall CPU utilization.
>>>>>
>>>>> In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2]
>>>>> and 5% less CPU utilization.
>>>>>
>>>>> [1]
>>>>> 3.52% io_uring [kernel.kallsyms] [k] bio_split_rw_at
>>>>> 1.42% io_uring [kernel.kallsyms] [k] bio_split_rw
>>>>> 0.62% io_uring [kernel.kallsyms] [k] bio_submit_split
>>>>>
>>>>> [2]
>>>>> sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2
>>>>> -r4 /dev/nvme0n1 /dev/nvme1n1
>>>>>
>>>>> Signed-off-by: Nitesh Shetty <nj.shetty@...sung.com>
>>>>> ---
>>>>> io_uring/rsrc.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>>>>> index b36c8825550e..6fd3a4a85a9c 100644
>>>>> --- a/io_uring/rsrc.c
>>>>> +++ b/io_uring/rsrc.c
>>>>> @@ -1096,6 +1096,9 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>>>>> iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1);
>>>>> }
>>>>> }
>>>>> + iter->nr_segs = (iter->bvec->bv_offset + iter->iov_offset +
>>>>> + iter->count + ((1UL << imu->folio_shift) - 1)) /
>>>>> + (1UL << imu->folio_shift);
>>>>
>>>> That's not going to work with ->is_kbuf as the segments are not uniform in
>>>> size.
>>>
>>> Oops yes good point.
>>
>> How about something like this? Trims superflous end segments, if they
>> exist. The 'offset' section already trimmed the front parts. For
>> !is_kbuf that should be simple math, like in Nitesh's patch. For
>> is_kbuf, iterate them.
>>
>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>> index bef66e733a77..e482ea1e22a9 100644
>> --- a/io_uring/rsrc.c
>> +++ b/io_uring/rsrc.c
>> @@ -1036,6 +1036,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>> struct io_mapped_ubuf *imu,
>> u64 buf_addr, size_t len)
>> {
>> + const struct bio_vec *bvec;
>> unsigned int folio_shift;
>> size_t offset;
>> int ret;
>> @@ -1052,9 +1053,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>> * Might not be a start of buffer, set size appropriately
>> * and advance us to the beginning.
>> */
>> + bvec = imu->bvec;
>> offset = buf_addr - imu->ubuf;
>> folio_shift = imu->folio_shift;
>> - iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len);
>> + iov_iter_bvec(iter, ddir, bvec, imu->nr_bvecs, offset + len);
>>
>> if (offset) {
>> /*
>> @@ -1073,7 +1075,6 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>> * since we can just skip the first segment, which may not
>> * be folio_size aligned.
>> */
>> - const struct bio_vec *bvec = imu->bvec;
>>
>> /*
>> * Kernel buffer bvecs, on the other hand, don't necessarily
>> @@ -1099,6 +1100,27 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>> }
>> }
>>
>> + /*
>> + * Offset trimmed front segments too, if any, now trim the tail.
>> + * For is_kbuf we'll iterate them as they may be different sizes,
>> + * otherwise we can just do straight up math.
>> + */
>> + if (len + offset < imu->len) {
>> + bvec = iter->bvec;
>> + if (imu->is_kbuf) {
>> + while (len > bvec->bv_len) {
>> + len -= bvec->bv_len;
>> + bvec++;
>> + }
>> + iter->nr_segs = bvec - iter->bvec;
>> + } else {
>> + size_t vec_len;
>> +
>> + vec_len = bvec->bv_offset + iter->iov_offset +
>> + iter->count + ((1UL << folio_shift) - 1);
>> + iter->nr_segs = vec_len >> folio_shift;
>> + }
>> + }
>> return 0;
>> }
> This might not be needed for is_kbuf , as they already update nr_seg
> inside iov_iter_advance.
How so? If 'offset' is true, then yes it'd skip the front, but it
doesn't skip the end part. And if 'offset' is 0, then no advancing is
done in the first place - which does make sense, as it's just advancing
from the front.
> How about changing something like this ?
You can't hide this in the if (offset) section...
--
Jens Axboe
Powered by blists - more mailing lists