netdev - Re: [PATCH v1 00/15] io

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2ff04413-9826-4696-9c8a-7a40cd886aae@davidwei.uk>
Date: Thu, 10 Oct 2024 17:35:47 -0700
From: David Wei <dw@...idwei.uk>
To: Pedro Tammela <pctammela@...atatu.com>,
 Mina Almasry <almasrymina@...gle.com>
Cc: io-uring@...r.kernel.org, netdev@...r.kernel.org,
 Jens Axboe <axboe@...nel.dk>, Pavel Begunkov <asml.silence@...il.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>, David Ahern <dsahern@...nel.org>,
 David Wei <dw@...idwei.uk>
Subject: Re: [PATCH v1 00/15] io_uring zero copy rx

On 2024-10-09 11:21, Pedro Tammela wrote:
> On 09/10/2024 13:55, Mina Almasry wrote:
>> [...]
>>
>> If not, I would like to see a comparison between TCP RX zerocopy and
>> this new io-uring zerocopy. For Google for example we use the TCP RX
>> zerocopy, I would like to see perf numbers possibly motivating us to
>> move to this new thing.
>>
>> [1] https://lwn.net/Articles/752046/
>>
> 
> Hi!
> 
> From my own testing, the TCP RX Zerocopy is quite heavy on the page unmapping side. Since the io_uring implementation is expected to be lighter (see patch 11), I would expect a simple comparison to show better numbers for io_uring.

Hi Pedro, I will add TCP_ZEROCOPY_RECEIVE to kperf and compare in the
next patchset.

> 
> To be fair to the existing implementation, it would then be needed to be paired with some 'real' computation, but that varies a lot. As we presented in netdevconf this year, HW-GRO eventually was the best option for us (no app changes, etc...) but still a case by case decision.

Why is there a need to add some computation to the benchmarks? A
benchmark is meant to be just that - a simple comparison that just looks
at the overheads of the stack. Real workloads are complex, I don't see
this feature as a universal win in all cases, but very workload and
userspace architecture dependent.

As for HW-GRO, whynotboth.jpg?