[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181001130313.318065fd@redhat.com>
Date: Mon, 1 Oct 2018 13:03:13 +0200
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Ilias Apalodimas <ilias.apalodimas@...aro.org>
Cc: netdev@...r.kernel.org, jaswinder.singh@...aro.org,
ard.biesheuvel@...aro.org, masami.hiramatsu@...aro.org,
arnd@...db.de, bjorn.topel@...el.com, magnus.karlsson@...el.com,
daniel@...earbox.net, ast@...nel.org,
jesus.sanchez-palencia@...el.com, vinicius.gomes@...el.com,
makita.toshiaki@....ntt.co.jp, Tariq Toukan <tariqt@...lanox.com>,
Tariq Toukan <ttoukan.linux@...il.com>, brouer@...hat.com
Subject: Re: [net-next, PATCH 1/2, v3] net: socionext: different approach on
DMA
On Mon, 1 Oct 2018 12:56:58 +0300
Ilias Apalodimas <ilias.apalodimas@...aro.org> wrote:
> > > #2: You have allocations on the XDP fast-path.
> > >
> > > The REAL secret behind the XDP performance is to avoid allocations on
> > > the fast-path. While I just told you to use the page-allocator and
> > > order-0 pages, this will actually kill performance. Thus, to make this
> > > fast, you need a driver local recycle scheme that avoids going through
> > > the page allocator, which makes XDP_DROP and XDP_TX extremely fast.
> > > For the XDP_REDIRECT action (which you seems to be interested in, as
> > > this is needed for AF_XDP), there is a xdp_return_frame() API that can
> > > make this fast.
> >
> > I had an initial implementation that did exactly that (that's why you the
> > dma_sync_single_for_cpu() -> dma_unmap_single_attrs() is there). In the case
> > of AF_XDP isn't that introducing a 'bottleneck' though? I mean you'll feed fresh
> > buffers back to the hardware only when your packets have been processed from
> > your userspace application
>
> Just a clarification here. This is the case if ZC is implemented. In my case
> the buffers will be 'ok' to be passed back to the hardware once the use
> userspace payload has been copied by xdp_do_redirect()
Thanks for clarifying. But no, this is not introducing a 'bottleneck'
for AF_XDP.
For (1) the copy-mode-AF_XDP the frame (as you noticed) is "freed" or
"returned" very quickly after it is copied. The code is a bit hard to
follow, but in __xsk_rcv() it calls xdp_return_buff() after the memcpy.
Thus, the frame can be kept DMA mapped and reused in RX-ring quickly.
For (2) the zero-copy-AF_XDP, then you need to implement a new
allocator of type MEM_TYPE_ZERO_COPY. The performance trick here is
that all DMA-map/unmap and allocations go away, given everything is
preallocated by userspace. Through the 4 rings (SPSC) are used for
recycling the ZC-umem frames (read Documentation/networking/af_xdp.rst).
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists