lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20180831095521.08b3515f@cakuba.netronome.com> Date: Fri, 31 Aug 2018 09:55:21 +0200 From: Jakub Kicinski <jakub.kicinski@...ronome.com> To: Björn Töpel <bjorn.topel@...el.com> Cc: Björn Töpel <bjorn.topel@...il.com>, magnus.karlsson@...el.com, magnus.karlsson@...il.com, alexander.h.duyck@...el.com, alexander.duyck@...il.com, ast@...nel.org, brouer@...hat.com, daniel@...earbox.net, netdev@...r.kernel.org, jesse.brandeburg@...el.com, anjali.singhai@...el.com, peter.waskiewicz.jr@...el.com, michael.lundkvist@...csson.com, willemdebruijn.kernel@...il.com, john.fastabend@...il.com, neerav.parikh@...el.com, mykyta.iziumtsev@...aro.org, francois.ozog@...aro.org, ilias.apalodimas@...aro.org, brian.brooks@...aro.org, u9012063@...il.com, pavel@...tnetmon.com, qi.z.zhang@...el.com Subject: Re: [PATCH bpf-next 08/11] i40e: add AF_XDP zero-copy Rx support On Thu, 30 Aug 2018 14:06:24 +0200, Björn Töpel wrote: > On 2018-08-29 21:14, Jakub Kicinski wrote: > > On Tue, 28 Aug 2018 14:44:32 +0200, Björn Töpel wrote: > >> From: Björn Töpel <bjorn.topel@...el.com> > >> > >> This patch adds zero-copy Rx support for AF_XDP sockets. Instead of > >> allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are > >> allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain > >> queue. > >> > >> All AF_XDP specific functions are added to a new file, i40e_xsk.c. > >> > >> Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS > >> will allocate a new buffer and copy the zero-copy frame prior passing > >> it to the kernel stack. > >> > >> Signed-off-by: Björn Töpel <bjorn.topel@...el.com> > > > > Mm.. I'm surprised you don't run into buffer reuse issues that I had > > when playing with AF_XDP. What happens in i40e if someone downs the > > interface? Will UMEMs get destroyed? Will the RX buffers get freed? > > > > The UMEM will linger in the driver until the sockets are dead. > > > I'll shortly send an RFC with my quick and dirty RX buffer reuse queue, > > FWIW. > > > > Some background for folks that don't know the details: A zero-copy > capable driver picks buffers off the fill ring and places them on the > hardware Rx ring to be completed at a later point when DMA is > complete. Analogous for the Tx side; The driver picks buffers off the > Tx ring and places them on the Tx hardware ring. > > In the typical flow, the Rx buffer will be placed onto an Rx ring > (completed to the user), and the Tx buffer will be placed on the > completion ring to notify the user that the transfer is done. > > However, if the driver needs to tear down the hardware rings for some > reason (interface goes down, reconfiguration and such), what should be > done with the Rx and Tx buffers that has been given to the driver? > > So, to frame the problem: What should a driver do when this happens, > so that buffers aren't leaked? > > Note that when the UMEM is going down, there's no need to complete > anything, since the sockets are dying/dead already. > > This is, as you state, a missing piece in the implementation and needs > to be fixed. > > Now on to possible solutions: > > 1. Complete the buffers back to the user. For Tx, this is probably the > best way -- just place the buffers onto the completion ring. > > For Rx, we can give buffers back to user space by setting the > length in the Rx descriptor to zero And putting them on the Rx > ring. However, one complication here is that we do not have any > back-pressure mechanism for the Rx side like we have on Tx. If the > Rx ring(s) is (are) full the kernel will have to leak them or > implement a retry mechanism (ugly and should be avoided). > > Another option to solve this without needing any retry or leaking > for Rx is to implement the same back-pressure mechanism that we > have on the Tx path in the Rx path. In the Tx path, the driver will > only get a Tx packet to send if there is space for it in the > completion ring. On Rx, this would be that the driver would only > get a buffer from the fill ring if there is space for it to put it > on the Rx ring. The drawback of this is that it would likely impact > performance negatively since the Rx ring would have to be touch one > more time (in the Tx path, it increased performance since it made > it possible to implement the Tx path without any buffering), but it > would guarantee that all buffers can always be returned to user > space making solution this a viable option. > > 2. Store the buffers internally in the driver, and make sure that they > are inserted into the "normal flow" again. For Rx that would be > putting the buffers back into the allocation scheme that the driver > is using. For Tx, placing the buffers back onto the Tx HW ring > (plus all the logic for making sure that all corner cases work). > > 3. Mark the socket(s) as in error state, en require the user to redo > the setup. This is bit harsh... That's a good summary, one more (bad) option: 4. Disallow any operations on the device which could lead to RX buffer discards if any UMEMs are attached. > For i40e I think #2 for Rx (buffers reside in kernel, return to > allocator) and #1 for Tx (complete to userland). > > Your RFC is plumbing to implement #2 for Rx in a driver. I'm not a fan > of extending the umem with the "reuse queue". This decision is really > up the driver. Some driver might prefer another scheme, or simply > prefer storing the buffers in another manner. The only performance cost is the extra pointer in xdp_umem. Drivers have to opt-in by using the *_rq() helpers. We can move the extra pointer to driver structs, and have them pass it to the helpers, so that would be zero extra cost, and we can reuse the implementation. > Looking forward, as both you and Jesper has alluded to, we need a > proper allocator for zero-copy. Then it would be a matter of injecting > the Rx buffers back to the allocator. > > I'll send out a patch, so that we don't leak the buffers, but I'd like > to hear your thoughts on what the behavior should be. > > And what should the behavior be when the netdev is removed from the > kernel? What would AF_PACKET do, return ENXIO? > And thanks for looking into the code! > > Björn
Powered by blists - more mailing lists