[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd9960ab-c9d8-8e5d-c347-8049cdf5708a@gmail.com>
Date: Fri, 8 Jul 2022 15:26:13 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: David Ahern <dsahern@...nel.org>, io-uring@...r.kernel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Jonathan Lemon <jonathan.lemon@...il.com>,
Willem de Bruijn <willemb@...gle.com>,
Jens Axboe <axboe@...nel.dk>, kernel-team@...com
Subject: Re: [PATCH net-next v4 00/27] io_uring zerocopy send
On 7/8/22 05:10, David Ahern wrote:
> On 7/7/22 5:49 AM, Pavel Begunkov wrote:
>> NOTE: Not be picked directly. After getting necessary acks, I'll be working
>> out merging with Jakub and Jens.
>>
>> The patchset implements io_uring zerocopy send. It works with both registered
>> and normal buffers, mixing is allowed but not recommended. Apart from usual
>> request completions, just as with MSG_ZEROCOPY, io_uring separately notifies
>> the userspace when buffers are freed and can be reused (see API design below),
>> which is delivered into io_uring's Completion Queue. Those "buffer-free"
>> notifications are not necessarily per request, but the userspace has control
>> over it and should explicitly attaching a number of requests to a single
>> notification. The series also adds some internal optimisations when used with
>> registered buffers like removing page referencing.
>>
>> From the kernel networking perspective there are two main changes. The first
>> one is passing ubuf_info into the network layer from io_uring (inside of an
>> in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info
>> caching on the io_uring side, but also helps to avoid cross-referencing
>> and synchronisation problems. The second part is an optional optimisation
>> removing page referencing for requests with registered buffers.
>>
>> Benchmarking with an optimised version of the selftest (see [1]), which sends
>> a bunch of requests, waits for completions and repeats. "+ flush" column posts
>> one additional "buffer-free" notification per request, and just "zc" doesn't
>> post buffer notifications at all.
>>
>> NIC (requests / second):
>> IO size | non-zc | zc | zc + flush
>> 4000 | 495134 | 606420 (+22%) | 558971 (+12%)
>> 1500 | 551808 | 577116 (+4.5%) | 565803 (+2.5%)
>> 1000 | 584677 | 592088 (+1.2%) | 560885 (-4%)
>> 600 | 596292 | 598550 (+0.4%) | 555366 (-6.7%)
>>
>> dummy (requests / second):
>> IO size | non-zc | zc | zc + flush
>> 8000 | 1299916 | 2396600 (+84%) | 2224219 (+71%)
>> 4000 | 1869230 | 2344146 (+25%) | 2170069 (+16%)
>> 1200 | 2071617 | 2361960 (+14%) | 2203052 (+6%)
>> 600 | 2106794 | 2381527 (+13%) | 2195295 (+4%)
>>
>> Previously it also brought a massive performance speedup compared to the
>> msg_zerocopy tool (see [3]), which is probably not super interesting.
>>
>
> can you add a comment that the above results are for UDP.
Oh, right, forgot to add it
> You dropped comments about TCP testing; any progress there? If not, can
> you relay any issues you are hitting?
Not really a problem, but for me it's bottle necked at NIC bandwidth
(~3GB/s) for both zc and non-zc and doesn't even nearly saturate a CPU.
Was actually benchmarked by my colleague quite a while ago, but can't
find numbers. Probably need to at least add localhost numbers or grab
a better server.
--
Pavel Begunkov
Powered by blists - more mailing lists