[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171116090023.27860207@redhat.com>
Date: Thu, 16 Nov 2017 09:00:23 +0100
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Björn Töpel <bjorn.topel@...il.com>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
"Duyck, Alexander H" <alexander.h.duyck@...el.com>,
Alexander Duyck <alexander.duyck@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Alexei Starovoitov <ast@...com>,
michael.lundkvist@...csson.com, ravineet.singh@...csson.com,
Daniel Borkmann <daniel@...earbox.net>,
Netdev <netdev@...r.kernel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Tushar Dave <tushar.n.dave@...cle.com>, eric.dumazet@...il.com,
Björn Töpel
<bjorn.topel@...el.com>, jesse.brandeburg@...el.com,
anjali.singhai@...el.com, rami.rosen@...el.com,
jeffrey.b.shaw@...el.com, ferruh.yigit@...el.com,
qi.z.zhang@...el.com, davem@...emloft.net, brouer@...hat.com
Subject: Re: [RFC PATCH 00/14] Introducing AF_PACKET V4 support (AF_XDP or
AF_CHANNEL?)
On Tue, 14 Nov 2017 20:01:01 +0100 Björn Töpel <bjorn.topel@...il.com> wrote:
> 2017-11-14 18:19 GMT+01:00 Jesper Dangaard Brouer <brouer@...hat.com>:
> >
> > On Mon, 13 Nov 2017 22:07:47 +0900 Björn Töpel <bjorn.topel@...il.com> wrote:
> >
> >> I'll summarize the major points, that we'll address in the next RFC
> >> below.
> >>
> >> * Instead of extending AF_PACKET with yet another version, introduce a
> >> new address/packet family. As for naming had some name suggestions:
> >> AF_CAPTURE, AF_CHANNEL, AF_XDP and AF_ZEROCOPY. We'll go for
> >> AF_ZEROCOPY, unless there're no strong opinions against it.
> >
> > I mostly like AF_CHANNEL and AF_XDP. I do know XDP is/have-evolved-into
> > a kernel-side facility, that moves XDP-frames/packets _inside_ the
> > kernel.
> >
> > *BUT* I've always imagined, that we would create a "channel" to
> > userspace. By using XDP_REDIRECT to choose what frames get redirected
> > into which userspace "channel" (new channel-map type). Userspace
> > pre-allocate and register memory/pages exactly like this patchset.
> >
> > [Step-1]: (non-ZC) XDP_REDIRECT need to copy frame-data into userspace
> > memory pages. And update your packet_array etc. (Use map-flush to get
> > RX bulking).
> >
> > [Step 2]: (ZC) Userspace call driver NDO to register pages. The
> > XDP_REDIRECT action happens in driver, and can have knowledge about
> > RX-ring. It can know if this RX-ring is Zero-Copy enabled and can skip
> > the copy-step.
> >
>
> Jesper, I *really* like this approach -- especially the fact that the
> existing XDP path in the drivers can be reused. I'll spend some time
> dissecting the details of your suggestion.
I'm very happy that you like this approach :-)
> >> * No explicit zerocopy enablement. Use the zeropcopy path if
> >> supported, if not -- fallback to the skb path, for netdevs that
> >> don't support the required ndos.
> >
> > When driver does not support NDO in above model. I think, that there
> > will still be a significant performance boost for the non-ZC variant.
> > Even-though we need a copy-operation, because there are no memory
> > allocations. As userspace have preallocated and registered pages with
> > the kernel (and mem-limits are implicit via mem-size reg by userspace).
> >
>
> Yup, and we're not paying for the whole skb creation, given that we
> execute from XDP_DRV and not XDP_SKB.
Yes, exactly. Avoiding the SKB allocation for non-ZC mode will be a
significant saving. As your benchmarks showed, the AF_PACKET-V4
approach for non-ZC mode does not give you/us any real performance
improvement. This approach would.
> >> * Do not introduce a new XDP action XDP_PASS_TO_KERNEL, instead use
> >> XDP redirect map call with ingress flag.
> >
> > In above model, XDP_REDIRECT is used for filtering into a userspace
> > "channel". If ZC gets enabled on a RX-ring queue, then XDP_PASS have
> > to do a copy (RX-ring knowledge is avail), like you describe with
> > XDP_PASS_TO_KERNEL.
> >
>
> Again, this fits nicely in.
>
> >> * Extend the XDP redirect to support explicit allocator/destructor
> >> functions. Right now, XDP redirect assumes that the page allocator
> >> was used, and the XDP redirect cleanup path is decreasing the page
> >> count of the XDP buffer. This assumption breaks for the zerocopy
> >> case.
> >
> > Yes, please. If XDP_REDIRECT get call a destructor call-back, then we
> > can allow XDP_REDIRECT out another net_device, even-when ZC is enabled
> > on a RX-ring queue.
I will (of-cause) be eager to test and benchmark this approach, as I
have high hopes a performance boost even for non-ZC. I know an AF_XDP
approach is a lot of work, but I would like to offer to help-out in
anyway I can.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists