[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89i+vsi3PpZz+tccsDn76k9oq3XNpDKEQWRhamm-t9EAZrA@mail.gmail.com>
Date: Tue, 4 Jan 2022 01:57:33 -0800
From: Eric Dumazet <edumazet@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: vitalif@...rcmc.ru, netdev <netdev@...r.kernel.org>
Subject: Re: How to test TCP zero-copy receive with real NICs?
On Mon, Jan 3, 2022 at 10:00 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Sat, 01 Jan 2022 20:49:49 +0000 vitalif@...rcmc.ru wrote:
> > Hi!
> >
> > Happy new year netdev mailing list :-)
> >
> > I have questions about your Linux TCP zero-copy support which is
> > described in these articles https://lwn.net/Articles/752046/ and
> > presentation:
> > https://legacy.netdevconf.info/0x14/pub/slides/62/Implementing%20TCP%20RX%20zero%20copy.pdf
> >
> > First of all, how to test it with real NICs?
> >
> > The presentation says it requires "collaboration" from the NIC and it
> > also mentions some NICs you used at Google. Which are these NICs? Was
> > the standard driver used or did it require custom patches to the
> > drivers?..
> >
> > I tried to test zerocopy with Mellanox ConnectX-4 and also with Intel
> > X520-DA2 (82599) and had no luck. I tried to find something like
> > "header-data split" or "packet split" in the drivers code, and as far
> > as I understood the support for header-data split in ixgbe was there
> > until 2012, but was removed in commit
> > f800326dca7bc158f4c886aa92f222de37993c80 ("ixgbe: Replace standard
> > receive path with a page based receive"). For Mellanox (again, as I
> > understand) it's not present at all...
>
> Try a Broadcom NIC that uses the bnxt driver. It seems to work pretty
> well, just need to enable GRO-HW or MTU > 4k and you'll get header-data
> split automatically. Doesn't even have to be a very recent NIC,
> I believe it's supported for a number of generations now.
>
> > The second question is more about my attempts to test it on loopback
> > - test tcp_mmap program (tools/testing/selftests/net/tcp_mmap.c from
> > the kernel source) works fine on loopback, but my examples with
> > TCP_NODELAY enabled are very brittle and only manage to sometimes use
> > zero-copy successfully (i.e. get something non-zero from getsockopt
> > TCP_ZEROCOPY_RECEIVE) with tcp_rmem=16384 16384 16384 AND 4 kb packet
> > size. And even in that case it only performs zerocopy on 30-50% of
> > packets. But that's at least something... And if I try to send larger
> > portions of data it breaks... And if I try to change buffers to
> > default it also breaks... And if I send 128 byte packets before 4096+
> > byte packets it also breaks... I tried to dump traffic and everything
> > looks good there, all packets are 40 bytes + payload(4096 or more), I
> > set MSS manually to 4096 and so on. Even tcp window sizes look good -
> > if I shift them by wscale they are always page-aligned.
> >
> > tcp_mmap, at the same time, works fine and I don't see any serious
> > difference between it and my test examples except TCP_NODELAY.
> >
> > So the second question is - how to make it stable with TCP_NODELAY,
> > even on loopback?)
> >
>
A mlx4 patch is doable, if you know the size of expected headers, and
if you do not use XDP.
We use IPv6 + TCP with TS options, total 86 bytes of headers.
I do not have time to cook a mlx4 patch based on current upstream
tree, maybe later this month.
Powered by blists - more mailing lists