[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ8uoz1nyv-_X5+z-nwyDOc628uYwmUVJCLkXJpsHgFK_QV+wQ@mail.gmail.com>
Date: Fri, 6 Nov 2020 20:09:42 +0100
From: Magnus Karlsson <magnus.karlsson@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
Björn Töpel <bjorn.topel@...el.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Network Development <netdev@...r.kernel.org>,
Jonathan Lemon <jonathan.lemon@...il.com>,
bpf <bpf@...r.kernel.org>, jeffrey.t.kirsher@...el.com,
anthony.l.nguyen@...el.com,
"Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>,
Maciej Fijalkowski <maciejromanfijalkowski@...il.com>,
intel-wired-lan <intel-wired-lan@...ts.osuosl.org>
Subject: Re: [PATCH bpf-next 1/6] i40e: introduce lazy Tx completions for
AF_XDP zero-copy
On Thu, Nov 5, 2020 at 4:45 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Thu, 5 Nov 2020 15:17:50 +0100 Magnus Karlsson wrote:
> > > I feel like this needs a big fat warning somewhere.
> > >
> > > It's perfectly fine to never complete TCP packets, but AF_XDP could be
> > > used to implement protocols in user space. What if someone wants to
> > > implement something like TSQ?
> >
> > I might misunderstand you, but with TSQ here (for something that
> > bypasses qdisk and any buffering and just goes straight to the driver)
> > you mean the ability to have just a few buffers outstanding and
> > continuously reuse these? If so, that is likely best achieved by
> > setting a low Tx queue size on the NIC. Note that even without this
> > patch, completions could be delayed. Though this patch makes that the
> > normal case. In any way, I think this calls for some improved
> > documentation.
>
> TSQ tries to limit the amount of data the TCP stack queues into TC/sched
> and drivers. Say 1MB ~ 16 GSO frames. It will not queue more data until
> some of the transfer is reported as completed.
Thanks. Got it. There is one more use case I can think of for quick
completions of Tx buffers and that is if you have metadata associated
with the completion, for example a Tx time stamp. Not that this
capability exists today, but hopefully it will get added at some
point.
Anyway after some more thinking, I would like to remove this patch
from the patch set and put it on the shelf for a while. The reason
behind this is that if we can get a good busy poll solution for AF_XDP
sockets, then we do not need this patch. With busy-poll the choice of
when to complete Tx buffers would be left to the application in a nice
way. If the application would like to quickly get buffers completed
(at the cost of some performance) it would call sendto() (or friends)
soon after it put the packet on the Tx ring. If max throughput is
desired with no regard to when a buffer is returned, then sendto()
would be called only after a large batch of packets have been put on
the Tx ring. No need for any threshold or new knob, in other words,
much nicer. So let us wait for Björn's busy poll patches and see where
it leads. Please protest if you do not agree. Otherwise I will submit
a v2 without this patch and with Maciej's proposed simplification.
> IIUC you're allowing up to 64 descriptors to linger without reporting
> back that the transfer is done. That means that user space implementing
> a scheme similar to TSQ may see its transfers stalled.
Powered by blists - more mailing lists