lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f4da5d66-9ae4-4a61-8c6c-394009c12c4c@kernel.dk>
Date: Wed, 16 Apr 2025 08:49:07 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Nitesh Shetty <nj.shetty@...sung.com>,
 Pavel Begunkov <asml.silence@...il.com>
Cc: gost.dev@...sung.com, nitheshshetty@...il.com, io-uring@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] io_uring/rsrc: send exact nr_segs for fixed buffer

On 4/16/25 8:43 AM, Jens Axboe wrote:
> On 4/16/25 8:19 AM, Jens Axboe wrote:
>> On 4/15/25 11:44 PM, Nitesh Shetty wrote:
>>> Sending exact nr_segs, avoids bio split check and processing in
>>> block layer, which takes around 5%[1] of overall CPU utilization.
>>>
>>> In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2]
>>> and 5% less CPU utilization.
>>>
>>> [1]
>>>      3.52%  io_uring         [kernel.kallsyms]     [k] bio_split_rw_at
>>>      1.42%  io_uring         [kernel.kallsyms]     [k] bio_split_rw
>>>      0.62%  io_uring         [kernel.kallsyms]     [k] bio_submit_split
>>>
>>> [2]
>>> sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2
>>> -r4 /dev/nvme0n1 /dev/nvme1n1
>>
>> This must be a regression, do you know which block/io_uring side commit
>> caused the splits to be done for fixed buffers?
>>
>>> Signed-off-by: Nitesh Shetty <nj.shetty@...sung.com>
>>> ---
>>>  io_uring/rsrc.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>>> index b36c8825550e..6fd3a4a85a9c 100644
>>> --- a/io_uring/rsrc.c
>>> +++ b/io_uring/rsrc.c
>>> @@ -1096,6 +1096,9 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>>>  			iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1);
>>>  		}
>>>  	}
>>> +	iter->nr_segs = (iter->bvec->bv_offset + iter->iov_offset +
>>> +		iter->count + ((1UL << imu->folio_shift) - 1)) /
>>> +		(1UL << imu->folio_shift);
>>
>> 	iter->nr_segs = (iter->bvec->bv_offset + iter->iov_offset +
>> 		iter->count + ((1UL << imu->folio_shift) - 1)) >> imu->folio_shift;
>>
>> to avoid a division, seems worthwhile?
> 
> And we should be able to drop the ->nr_segs assignment in the above
> section as well with this change.
> 
> Tested on a box here, previously:
> 
> IOPS=99.19M, BW=48.43GiB/s, IOS/call=32/31
> IOPS=99.48M, BW=48.57GiB/s, IOS/call=32/32
> IOPS=99.43M, BW=48.55GiB/s, IOS/call=32/32
> IOPS=99.48M, BW=48.57GiB/s, IOS/call=31/31
> IOPS=99.49M, BW=48.58GiB/s, IOS/call=32/32
> 
> and with the fix:
> 
> IOPS=103.28M, BW=50.43GiB/s, IOS/call=32/31
> IOPS=103.18M, BW=50.38GiB/s, IOS/call=32/32
> IOPS=103.22M, BW=50.40GiB/s, IOS/call=32/31
> IOPS=103.18M, BW=50.38GiB/s, IOS/call=31/32
> IOPS=103.19M, BW=50.38GiB/s, IOS/call=31/32
> IOPS=103.12M, BW=50.35GiB/s, IOS/call=32/31
> 
> and I do indeed see the same ~4% time wasted on splits.

Applied this with a pre-patch to avoid overly long lines, and
with the redundant nr_segs removed and the division eliminated.
See here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-6.15

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ