[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d8e729ed-0bc8-4094-ad22-36c6312def25@mojatatu.com>
Date: Fri, 11 Oct 2024 11:28:07 -0300
From: Pedro Tammela <pctammela@...atatu.com>
To: David Wei <dw@...idwei.uk>, Mina Almasry <almasrymina@...gle.com>
Cc: io-uring@...r.kernel.org, netdev@...r.kernel.org,
Jens Axboe <axboe@...nel.dk>, Pavel Begunkov <asml.silence@...il.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jesper Dangaard Brouer <hawk@...nel.org>, David Ahern <dsahern@...nel.org>
Subject: Re: [PATCH v1 00/15] io_uring zero copy rx
On 10/10/2024 21:35, David Wei wrote:
> On 2024-10-09 11:21, Pedro Tammela wrote:
>> On 09/10/2024 13:55, Mina Almasry wrote:
>>> [...]
>>>
>>> If not, I would like to see a comparison between TCP RX zerocopy and
>>> this new io-uring zerocopy. For Google for example we use the TCP RX
>>> zerocopy, I would like to see perf numbers possibly motivating us to
>>> move to this new thing.
>>>
>>> [1] https://lwn.net/Articles/752046/
>>>
>>
>> Hi!
>>
>> From my own testing, the TCP RX Zerocopy is quite heavy on the page unmapping side. Since the io_uring implementation is expected to be lighter (see patch 11), I would expect a simple comparison to show better numbers for io_uring.
>
> Hi Pedro, I will add TCP_ZEROCOPY_RECEIVE to kperf and compare in the
> next patchset.
>
>>
>> To be fair to the existing implementation, it would then be needed to be paired with some 'real' computation, but that varies a lot. As we presented in netdevconf this year, HW-GRO eventually was the best option for us (no app changes, etc...) but still a case by case decision.
>
> Why is there a need to add some computation to the benchmarks? A
> benchmark is meant to be just that - a simple comparison that just looks
> at the overheads of the stack.
For the use case we saw, streaming lots of data with zc, the RX pages
would linger for a reasonable time
in processing and the unmap cost amortized in the hotpath.
Which was not considered in our simple benchmark.
So for Mina's case, I guess the only way to know for sure if it's worth
is to implement the io_uring approach and compare.
> Real workloads are complex, I don't see> this feature as a universal win in all cases, but very workload and
> userspace architecture dependent.
100% agree here, that's our experience so far as well.
Just wanted to share this sentiment in my previous email.
I personally believe the io_uring approach will encompass more use cases
than the existing implementation.
>
> As for HW-GRO, whynotboth.jpg?
For us the cost of changing the apps/services to accomodate rx zc was
prohibitive for now,
which lead us to stick with HW-GRO.
IIRC, you mentioned in netdevconf Meta uses a library for RPC, but we
don't have this luxury :/
Powered by blists - more mailing lists