lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b57dd3b9-b607-46ab-a2d0-98aedb2772f7@gmail.com>
Date: Wed, 9 Oct 2024 16:49:15 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Jens Axboe <axboe@...nel.dk>, David Ahern <dsahern@...nel.org>,
 David Wei <dw@...idwei.uk>, io-uring@...r.kernel.org, netdev@...r.kernel.org
Cc: Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 Mina Almasry <almasrymina@...gle.com>
Subject: Re: [PATCH v1 00/15] io_uring zero copy rx

On 10/9/24 16:43, Jens Axboe wrote:
> On 10/9/24 9:38 AM, David Ahern wrote:
>> On 10/9/24 9:27 AM, Jens Axboe wrote:
>>> On 10/7/24 4:15 PM, David Wei wrote:
>>>> ===========
>>>> Performance
>>>> ===========
>>>>
>>>> Test setup:
>>>> * AMD EPYC 9454
>>>> * Broadcom BCM957508 200G
>>>> * Kernel v6.11 base [2]
>>>> * liburing fork [3]
>>>> * kperf fork [4]
>>>> * 4K MTU
>>>> * Single TCP flow
>>>>
>>>> With application thread + net rx softirq pinned to _different_ cores:
>>>>
>>>> epoll
>>>> 82.2 Gbps
>>>>
>>>> io_uring
>>>> 116.2 Gbps (+41%)
>>>>
>>>> Pinned to _same_ core:
>>>>
>>>> epoll
>>>> 62.6 Gbps
>>>>
>>>> io_uring
>>>> 80.9 Gbps (+29%)
>>>
>>> I'll review the io_uring bits in detail, but I did take a quick look and
>>> overall it looks really nice.
>>>
>>> I decided to give this a spin, as I noticed that Broadcom now has a
>>> 230.x firmware release out that supports this. Hence no dependencies on
>>> that anymore, outside of some pain getting the fw updated. Here are my
>>> test setup details:
>>>
>>> Receiver:
>>> AMD EPYC 9754 (recei
>>> Broadcom P2100G
>>> -git + this series + the bnxt series referenced
>>>
>>> Sender:
>>> Intel(R) Xeon(R) Platinum 8458P
>>> Broadcom P2100G
>>> -git
>>>
>>> Test:
>>> kperf with David's patches to support io_uring zc. Eg single flow TCP,
>>> just testing bandwidth. A single cpu/thread being used on both the
>>> receiver and sender side.
>>>
>>> non-zc
>>> 60.9 Gbps
>>>
>>> io_uring + zc
>>> 97.1 Gbps
>>
>> so line rate? Did you look at whether there is cpu to spare? meaning it
>> will report higher speeds with a 200G setup?
> 
> Yep basically line rate, I get 97-98Gbps. I originally used a slower box
> as the sender, but then you're capped on the non-zc sender being too
> slow. The intel box does better, but it's still basically maxing out the
> sender at this point. So yeah, with a faster (or more efficient sender),
> I have no doubts this will go much higher per thread, if the link bw was
> there. When I looked at CPU usage for the receiver, the thread itself is
> using ~30% CPU. And then there's some softirq/irq time outside of that,
> but that should ammortize with higher bps rates too I'd expect.
> 
> My nic does have 2 100G ports, so might warrant a bit more testing...
If you haven't done it already, I'd also pin softirq processing to
the same CPU as the app so we measure the full stack. kperf has an
option IIRC.

-- 
Pavel Begunkov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ