lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e475d9f-8d39-43f4-adc5-501897c951a8@kernel.dk>
Date: Wed, 9 Oct 2024 09:27:07 -0600
From: Jens Axboe <axboe@...nel.dk>
To: David Wei <dw@...idwei.uk>, io-uring@...r.kernel.org,
 netdev@...r.kernel.org
Cc: Pavel Begunkov <asml.silence@...il.com>, Jakub Kicinski
 <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>, David Ahern <dsahern@...nel.org>,
 Mina Almasry <almasrymina@...gle.com>
Subject: Re: [PATCH v1 00/15] io_uring zero copy rx

On 10/7/24 4:15 PM, David Wei wrote:
> ===========
> Performance
> ===========
> 
> Test setup:
> * AMD EPYC 9454
> * Broadcom BCM957508 200G
> * Kernel v6.11 base [2]
> * liburing fork [3]
> * kperf fork [4]
> * 4K MTU
> * Single TCP flow
> 
> With application thread + net rx softirq pinned to _different_ cores:
> 
> epoll
> 82.2 Gbps
> 
> io_uring
> 116.2 Gbps (+41%)
> 
> Pinned to _same_ core:
> 
> epoll
> 62.6 Gbps
> 
> io_uring
> 80.9 Gbps (+29%)

I'll review the io_uring bits in detail, but I did take a quick look and
overall it looks really nice.

I decided to give this a spin, as I noticed that Broadcom now has a
230.x firmware release out that supports this. Hence no dependencies on
that anymore, outside of some pain getting the fw updated. Here are my
test setup details:

Receiver:
AMD EPYC 9754 (recei
Broadcom P2100G
-git + this series + the bnxt series referenced

Sender:
Intel(R) Xeon(R) Platinum 8458P
Broadcom P2100G
-git

Test:
kperf with David's patches to support io_uring zc. Eg single flow TCP,
just testing bandwidth. A single cpu/thread being used on both the
receiver and sender side.

non-zc
60.9 Gbps

io_uring + zc
97.1 Gbps

or +59% faster. There's quite a bit of IRQ side work, I'm guessing I
might need to tune it a bit. But it Works For Me, and the results look
really nice.

I did run into an issue with the bnxt driver defaulting to shared tx/rx
queues, and it not working for me in that configuration. Once I disabled
that, it worked fine. This may or may not be an issue with the flow rule
to direct the traffic, the driver queue start, or something else. Don't
know for sure, will need to check with the driver folks. Once sorted, I
didn't see any issues with the code in the patchset.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ