[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220103100012.7507e0e1@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>
Date: Mon, 3 Jan 2022 10:00:12 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: vitalif@...rcmc.ru
Cc: edumazet@...gle.com, netdev@...r.kernel.org
Subject: Re: How to test TCP zero-copy receive with real NICs?
On Sat, 01 Jan 2022 20:49:49 +0000 vitalif@...rcmc.ru wrote:
> Hi!
>
> Happy new year netdev mailing list :-)
>
> I have questions about your Linux TCP zero-copy support which is
> described in these articles https://lwn.net/Articles/752046/ and
> presentation:
> https://legacy.netdevconf.info/0x14/pub/slides/62/Implementing%20TCP%20RX%20zero%20copy.pdf
>
> First of all, how to test it with real NICs?
>
> The presentation says it requires "collaboration" from the NIC and it
> also mentions some NICs you used at Google. Which are these NICs? Was
> the standard driver used or did it require custom patches to the
> drivers?..
>
> I tried to test zerocopy with Mellanox ConnectX-4 and also with Intel
> X520-DA2 (82599) and had no luck. I tried to find something like
> "header-data split" or "packet split" in the drivers code, and as far
> as I understood the support for header-data split in ixgbe was there
> until 2012, but was removed in commit
> f800326dca7bc158f4c886aa92f222de37993c80 ("ixgbe: Replace standard
> receive path with a page based receive"). For Mellanox (again, as I
> understand) it's not present at all...
Try a Broadcom NIC that uses the bnxt driver. It seems to work pretty
well, just need to enable GRO-HW or MTU > 4k and you'll get header-data
split automatically. Doesn't even have to be a very recent NIC,
I believe it's supported for a number of generations now.
> The second question is more about my attempts to test it on loopback
> - test tcp_mmap program (tools/testing/selftests/net/tcp_mmap.c from
> the kernel source) works fine on loopback, but my examples with
> TCP_NODELAY enabled are very brittle and only manage to sometimes use
> zero-copy successfully (i.e. get something non-zero from getsockopt
> TCP_ZEROCOPY_RECEIVE) with tcp_rmem=16384 16384 16384 AND 4 kb packet
> size. And even in that case it only performs zerocopy on 30-50% of
> packets. But that's at least something... And if I try to send larger
> portions of data it breaks... And if I try to change buffers to
> default it also breaks... And if I send 128 byte packets before 4096+
> byte packets it also breaks... I tried to dump traffic and everything
> looks good there, all packets are 40 bytes + payload(4096 or more), I
> set MSS manually to 4096 and so on. Even tcp window sizes look good -
> if I shift them by wscale they are always page-aligned.
>
> tcp_mmap, at the same time, works fine and I don't see any serious
> difference between it and my test examples except TCP_NODELAY.
>
> So the second question is - how to make it stable with TCP_NODELAY,
> even on loopback?)
>
Powered by blists - more mailing lists