lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 01 Jan 2022 20:49:49 +0000
From:   vitalif@...rcmc.ru
To:     edumazet@...gle.com, netdev@...r.kernel.org
Subject: How to test TCP zero-copy receive with real NICs?

Hi!

Happy new year netdev mailing list :-)

I have questions about your Linux TCP zero-copy support which is described in these articles https://lwn.net/Articles/752046/ and presentation: https://legacy.netdevconf.info/0x14/pub/slides/62/Implementing%20TCP%20RX%20zero%20copy.pdf

First of all, how to test it with real NICs?

The presentation says it requires "collaboration" from the NIC and it also mentions some NICs you used at Google. Which are these NICs? Was the standard driver used or did it require custom patches to the drivers?..

I tried to test zerocopy with Mellanox ConnectX-4 and also with Intel X520-DA2 (82599) and had no luck. I tried to find something like "header-data split" or "packet split" in the drivers code, and as far as I understood the support for header-data split in ixgbe was there until 2012, but was removed in commit f800326dca7bc158f4c886aa92f222de37993c80 ("ixgbe: Replace standard receive path with a page based receive"). For Mellanox (again, as I understand) it's not present at all...

The second question is more about my attempts to test it on loopback - test tcp_mmap program (tools/testing/selftests/net/tcp_mmap.c from the kernel source) works fine on loopback, but my examples with TCP_NODELAY enabled are very brittle and only manage to sometimes use zero-copy successfully (i.e. get something non-zero from getsockopt TCP_ZEROCOPY_RECEIVE) with tcp_rmem=16384 16384 16384 AND 4 kb packet size. And even in that case it only performs zerocopy on 30-50% of packets. But that's at least something... And if I try to send larger portions of data it breaks... And if I try to change buffers to default it also breaks... And if I send 128 byte packets before 4096+ byte packets it also breaks... I tried to dump traffic and everything looks good there, all packets are 40 bytes + payload(4096 or more), I set MSS manually to 4096 and so on. Even tcp window sizes look good - if I shift them by wscale they are always page-aligned.

tcp_mmap, at the same time, works fine and I don't see any serious difference between it and my test examples except TCP_NODELAY.

So the second question is - how to make it stable with TCP_NODELAY, even on loopback?)

-- 
With best regards,
  Vitaliy Filippov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ