lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 3 Jan 2022 10:00:12 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     vitalif@...rcmc.ru
Cc:     edumazet@...gle.com, netdev@...r.kernel.org
Subject: Re: How to test TCP zero-copy receive with real NICs?

On Sat, 01 Jan 2022 20:49:49 +0000 vitalif@...rcmc.ru wrote:
> Hi!
> 
> Happy new year netdev mailing list :-)
> 
> I have questions about your Linux TCP zero-copy support which is
> described in these articles https://lwn.net/Articles/752046/ and
> presentation:
> https://legacy.netdevconf.info/0x14/pub/slides/62/Implementing%20TCP%20RX%20zero%20copy.pdf
> 
> First of all, how to test it with real NICs?
> 
> The presentation says it requires "collaboration" from the NIC and it
> also mentions some NICs you used at Google. Which are these NICs? Was
> the standard driver used or did it require custom patches to the
> drivers?..
> 
> I tried to test zerocopy with Mellanox ConnectX-4 and also with Intel
> X520-DA2 (82599) and had no luck. I tried to find something like
> "header-data split" or "packet split" in the drivers code, and as far
> as I understood the support for header-data split in ixgbe was there
> until 2012, but was removed in commit
> f800326dca7bc158f4c886aa92f222de37993c80 ("ixgbe: Replace standard
> receive path with a page based receive"). For Mellanox (again, as I
> understand) it's not present at all...

Try a Broadcom NIC that uses the bnxt driver. It seems to work pretty
well, just need to enable GRO-HW or MTU > 4k and you'll get header-data
split automatically. Doesn't even have to be a very recent NIC, 
I believe it's supported for a number of generations now.

> The second question is more about my attempts to test it on loopback
> - test tcp_mmap program (tools/testing/selftests/net/tcp_mmap.c from
> the kernel source) works fine on loopback, but my examples with
> TCP_NODELAY enabled are very brittle and only manage to sometimes use
> zero-copy successfully (i.e. get something non-zero from getsockopt
> TCP_ZEROCOPY_RECEIVE) with tcp_rmem=16384 16384 16384 AND 4 kb packet
> size. And even in that case it only performs zerocopy on 30-50% of
> packets. But that's at least something... And if I try to send larger
> portions of data it breaks... And if I try to change buffers to
> default it also breaks... And if I send 128 byte packets before 4096+
> byte packets it also breaks... I tried to dump traffic and everything
> looks good there, all packets are 40 bytes + payload(4096 or more), I
> set MSS manually to 4096 and so on. Even tcp window sizes look good -
> if I shift them by wscale they are always page-aligned.
> 
> tcp_mmap, at the same time, works fine and I don't see any serious
> difference between it and my test examples except TCP_NODELAY.
> 
> So the second question is - how to make it stable with TCP_NODELAY,
> even on loopback?)
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ