[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180831095521.08b3515f@cakuba.netronome.com>
Date: Fri, 31 Aug 2018 09:55:21 +0200
From: Jakub Kicinski <jakub.kicinski@...ronome.com>
To: Björn Töpel <bjorn.topel@...el.com>
Cc: Björn Töpel <bjorn.topel@...il.com>,
magnus.karlsson@...el.com, magnus.karlsson@...il.com,
alexander.h.duyck@...el.com, alexander.duyck@...il.com,
ast@...nel.org, brouer@...hat.com, daniel@...earbox.net,
netdev@...r.kernel.org, jesse.brandeburg@...el.com,
anjali.singhai@...el.com, peter.waskiewicz.jr@...el.com,
michael.lundkvist@...csson.com, willemdebruijn.kernel@...il.com,
john.fastabend@...il.com, neerav.parikh@...el.com,
mykyta.iziumtsev@...aro.org, francois.ozog@...aro.org,
ilias.apalodimas@...aro.org, brian.brooks@...aro.org,
u9012063@...il.com, pavel@...tnetmon.com, qi.z.zhang@...el.com
Subject: Re: [PATCH bpf-next 08/11] i40e: add AF_XDP zero-copy Rx support
On Thu, 30 Aug 2018 14:06:24 +0200, Björn Töpel wrote:
> On 2018-08-29 21:14, Jakub Kicinski wrote:
> > On Tue, 28 Aug 2018 14:44:32 +0200, Björn Töpel wrote:
> >> From: Björn Töpel <bjorn.topel@...el.com>
> >>
> >> This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
> >> allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
> >> allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
> >> queue.
> >>
> >> All AF_XDP specific functions are added to a new file, i40e_xsk.c.
> >>
> >> Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
> >> will allocate a new buffer and copy the zero-copy frame prior passing
> >> it to the kernel stack.
> >>
> >> Signed-off-by: Björn Töpel <bjorn.topel@...el.com>
> >
> > Mm.. I'm surprised you don't run into buffer reuse issues that I had
> > when playing with AF_XDP. What happens in i40e if someone downs the
> > interface? Will UMEMs get destroyed? Will the RX buffers get freed?
> >
>
> The UMEM will linger in the driver until the sockets are dead.
>
> > I'll shortly send an RFC with my quick and dirty RX buffer reuse queue,
> > FWIW.
> >
>
> Some background for folks that don't know the details: A zero-copy
> capable driver picks buffers off the fill ring and places them on the
> hardware Rx ring to be completed at a later point when DMA is
> complete. Analogous for the Tx side; The driver picks buffers off the
> Tx ring and places them on the Tx hardware ring.
>
> In the typical flow, the Rx buffer will be placed onto an Rx ring
> (completed to the user), and the Tx buffer will be placed on the
> completion ring to notify the user that the transfer is done.
>
> However, if the driver needs to tear down the hardware rings for some
> reason (interface goes down, reconfiguration and such), what should be
> done with the Rx and Tx buffers that has been given to the driver?
>
> So, to frame the problem: What should a driver do when this happens,
> so that buffers aren't leaked?
>
> Note that when the UMEM is going down, there's no need to complete
> anything, since the sockets are dying/dead already.
>
> This is, as you state, a missing piece in the implementation and needs
> to be fixed.
>
> Now on to possible solutions:
>
> 1. Complete the buffers back to the user. For Tx, this is probably the
> best way -- just place the buffers onto the completion ring.
>
> For Rx, we can give buffers back to user space by setting the
> length in the Rx descriptor to zero And putting them on the Rx
> ring. However, one complication here is that we do not have any
> back-pressure mechanism for the Rx side like we have on Tx. If the
> Rx ring(s) is (are) full the kernel will have to leak them or
> implement a retry mechanism (ugly and should be avoided).
>
> Another option to solve this without needing any retry or leaking
> for Rx is to implement the same back-pressure mechanism that we
> have on the Tx path in the Rx path. In the Tx path, the driver will
> only get a Tx packet to send if there is space for it in the
> completion ring. On Rx, this would be that the driver would only
> get a buffer from the fill ring if there is space for it to put it
> on the Rx ring. The drawback of this is that it would likely impact
> performance negatively since the Rx ring would have to be touch one
> more time (in the Tx path, it increased performance since it made
> it possible to implement the Tx path without any buffering), but it
> would guarantee that all buffers can always be returned to user
> space making solution this a viable option.
>
> 2. Store the buffers internally in the driver, and make sure that they
> are inserted into the "normal flow" again. For Rx that would be
> putting the buffers back into the allocation scheme that the driver
> is using. For Tx, placing the buffers back onto the Tx HW ring
> (plus all the logic for making sure that all corner cases work).
>
> 3. Mark the socket(s) as in error state, en require the user to redo
> the setup. This is bit harsh...
That's a good summary, one more (bad) option:
4. Disallow any operations on the device which could lead to RX buffer
discards if any UMEMs are attached.
> For i40e I think #2 for Rx (buffers reside in kernel, return to
> allocator) and #1 for Tx (complete to userland).
>
> Your RFC is plumbing to implement #2 for Rx in a driver. I'm not a fan
> of extending the umem with the "reuse queue". This decision is really
> up the driver. Some driver might prefer another scheme, or simply
> prefer storing the buffers in another manner.
The only performance cost is the extra pointer in xdp_umem. Drivers
have to opt-in by using the *_rq() helpers. We can move the extra
pointer to driver structs, and have them pass it to the helpers, so that
would be zero extra cost, and we can reuse the implementation.
> Looking forward, as both you and Jesper has alluded to, we need a
> proper allocator for zero-copy. Then it would be a matter of injecting
> the Rx buffers back to the allocator.
>
> I'll send out a patch, so that we don't leak the buffers, but I'd like
> to hear your thoughts on what the behavior should be.
>
> And what should the behavior be when the netdev is removed from the
> kernel?
What would AF_PACKET do, return ENXIO?
> And thanks for looking into the code!
>
> Björn
Powered by blists - more mailing lists