[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b57dd3b9-b607-46ab-a2d0-98aedb2772f7@gmail.com>
Date: Wed, 9 Oct 2024 16:49:15 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Jens Axboe <axboe@...nel.dk>, David Ahern <dsahern@...nel.org>,
David Wei <dw@...idwei.uk>, io-uring@...r.kernel.org, netdev@...r.kernel.org
Cc: Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Mina Almasry <almasrymina@...gle.com>
Subject: Re: [PATCH v1 00/15] io_uring zero copy rx
On 10/9/24 16:43, Jens Axboe wrote:
> On 10/9/24 9:38 AM, David Ahern wrote:
>> On 10/9/24 9:27 AM, Jens Axboe wrote:
>>> On 10/7/24 4:15 PM, David Wei wrote:
>>>> ===========
>>>> Performance
>>>> ===========
>>>>
>>>> Test setup:
>>>> * AMD EPYC 9454
>>>> * Broadcom BCM957508 200G
>>>> * Kernel v6.11 base [2]
>>>> * liburing fork [3]
>>>> * kperf fork [4]
>>>> * 4K MTU
>>>> * Single TCP flow
>>>>
>>>> With application thread + net rx softirq pinned to _different_ cores:
>>>>
>>>> epoll
>>>> 82.2 Gbps
>>>>
>>>> io_uring
>>>> 116.2 Gbps (+41%)
>>>>
>>>> Pinned to _same_ core:
>>>>
>>>> epoll
>>>> 62.6 Gbps
>>>>
>>>> io_uring
>>>> 80.9 Gbps (+29%)
>>>
>>> I'll review the io_uring bits in detail, but I did take a quick look and
>>> overall it looks really nice.
>>>
>>> I decided to give this a spin, as I noticed that Broadcom now has a
>>> 230.x firmware release out that supports this. Hence no dependencies on
>>> that anymore, outside of some pain getting the fw updated. Here are my
>>> test setup details:
>>>
>>> Receiver:
>>> AMD EPYC 9754 (recei
>>> Broadcom P2100G
>>> -git + this series + the bnxt series referenced
>>>
>>> Sender:
>>> Intel(R) Xeon(R) Platinum 8458P
>>> Broadcom P2100G
>>> -git
>>>
>>> Test:
>>> kperf with David's patches to support io_uring zc. Eg single flow TCP,
>>> just testing bandwidth. A single cpu/thread being used on both the
>>> receiver and sender side.
>>>
>>> non-zc
>>> 60.9 Gbps
>>>
>>> io_uring + zc
>>> 97.1 Gbps
>>
>> so line rate? Did you look at whether there is cpu to spare? meaning it
>> will report higher speeds with a 200G setup?
>
> Yep basically line rate, I get 97-98Gbps. I originally used a slower box
> as the sender, but then you're capped on the non-zc sender being too
> slow. The intel box does better, but it's still basically maxing out the
> sender at this point. So yeah, with a faster (or more efficient sender),
> I have no doubts this will go much higher per thread, if the link bw was
> there. When I looked at CPU usage for the receiver, the thread itself is
> using ~30% CPU. And then there's some softirq/irq time outside of that,
> but that should ammortize with higher bps rates too I'd expect.
>
> My nic does have 2 100G ports, so might warrant a bit more testing...
If you haven't done it already, I'd also pin softirq processing to
the same CPU as the app so we measure the full stack. kperf has an
option IIRC.
--
Pavel Begunkov
Powered by blists - more mailing lists