[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 01 Jan 2022 20:49:49 +0000
From: vitalif@...rcmc.ru
To: edumazet@...gle.com, netdev@...r.kernel.org
Subject: How to test TCP zero-copy receive with real NICs?
Hi!
Happy new year netdev mailing list :-)
I have questions about your Linux TCP zero-copy support which is described in these articles https://lwn.net/Articles/752046/ and presentation: https://legacy.netdevconf.info/0x14/pub/slides/62/Implementing%20TCP%20RX%20zero%20copy.pdf
First of all, how to test it with real NICs?
The presentation says it requires "collaboration" from the NIC and it also mentions some NICs you used at Google. Which are these NICs? Was the standard driver used or did it require custom patches to the drivers?..
I tried to test zerocopy with Mellanox ConnectX-4 and also with Intel X520-DA2 (82599) and had no luck. I tried to find something like "header-data split" or "packet split" in the drivers code, and as far as I understood the support for header-data split in ixgbe was there until 2012, but was removed in commit f800326dca7bc158f4c886aa92f222de37993c80 ("ixgbe: Replace standard receive path with a page based receive"). For Mellanox (again, as I understand) it's not present at all...
The second question is more about my attempts to test it on loopback - test tcp_mmap program (tools/testing/selftests/net/tcp_mmap.c from the kernel source) works fine on loopback, but my examples with TCP_NODELAY enabled are very brittle and only manage to sometimes use zero-copy successfully (i.e. get something non-zero from getsockopt TCP_ZEROCOPY_RECEIVE) with tcp_rmem=16384 16384 16384 AND 4 kb packet size. And even in that case it only performs zerocopy on 30-50% of packets. But that's at least something... And if I try to send larger portions of data it breaks... And if I try to change buffers to default it also breaks... And if I send 128 byte packets before 4096+ byte packets it also breaks... I tried to dump traffic and everything looks good there, all packets are 40 bytes + payload(4096 or more), I set MSS manually to 4096 and so on. Even tcp window sizes look good - if I shift them by wscale they are always page-aligned.
tcp_mmap, at the same time, works fine and I don't see any serious difference between it and my test examples except TCP_NODELAY.
So the second question is - how to make it stable with TCP_NODELAY, even on loopback?)
--
With best regards,
Vitaliy Filippov
Powered by blists - more mailing lists