lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20220103100012.7507e0e1@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> Date: Mon, 3 Jan 2022 10:00:12 -0800 From: Jakub Kicinski <kuba@...nel.org> To: vitalif@...rcmc.ru Cc: edumazet@...gle.com, netdev@...r.kernel.org Subject: Re: How to test TCP zero-copy receive with real NICs? On Sat, 01 Jan 2022 20:49:49 +0000 vitalif@...rcmc.ru wrote: > Hi! > > Happy new year netdev mailing list :-) > > I have questions about your Linux TCP zero-copy support which is > described in these articles https://lwn.net/Articles/752046/ and > presentation: > https://legacy.netdevconf.info/0x14/pub/slides/62/Implementing%20TCP%20RX%20zero%20copy.pdf > > First of all, how to test it with real NICs? > > The presentation says it requires "collaboration" from the NIC and it > also mentions some NICs you used at Google. Which are these NICs? Was > the standard driver used or did it require custom patches to the > drivers?.. > > I tried to test zerocopy with Mellanox ConnectX-4 and also with Intel > X520-DA2 (82599) and had no luck. I tried to find something like > "header-data split" or "packet split" in the drivers code, and as far > as I understood the support for header-data split in ixgbe was there > until 2012, but was removed in commit > f800326dca7bc158f4c886aa92f222de37993c80 ("ixgbe: Replace standard > receive path with a page based receive"). For Mellanox (again, as I > understand) it's not present at all... Try a Broadcom NIC that uses the bnxt driver. It seems to work pretty well, just need to enable GRO-HW or MTU > 4k and you'll get header-data split automatically. Doesn't even have to be a very recent NIC, I believe it's supported for a number of generations now. > The second question is more about my attempts to test it on loopback > - test tcp_mmap program (tools/testing/selftests/net/tcp_mmap.c from > the kernel source) works fine on loopback, but my examples with > TCP_NODELAY enabled are very brittle and only manage to sometimes use > zero-copy successfully (i.e. get something non-zero from getsockopt > TCP_ZEROCOPY_RECEIVE) with tcp_rmem=16384 16384 16384 AND 4 kb packet > size. And even in that case it only performs zerocopy on 30-50% of > packets. But that's at least something... And if I try to send larger > portions of data it breaks... And if I try to change buffers to > default it also breaks... And if I send 128 byte packets before 4096+ > byte packets it also breaks... I tried to dump traffic and everything > looks good there, all packets are 40 bytes + payload(4096 or more), I > set MSS manually to 4096 and so on. Even tcp window sizes look good - > if I shift them by wscale they are always page-aligned. > > tcp_mmap, at the same time, works fine and I don't see any serious > difference between it and my test examples except TCP_NODELAY. > > So the second question is - how to make it stable with TCP_NODELAY, > even on loopback?) >
Powered by blists - more mailing lists