[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191219114438.0bcb33ea@carbon>
Date: Thu, 19 Dec 2019 11:44:38 +0100
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: "Jubran, Samih" <sameehj@...zon.com>
Cc: Luigi Rizzo <rizzo@....unipi.it>,
"Machulsky, Zorik" <zorik@...zon.com>,
Daniel Borkmann <borkmann@...earbox.net>,
David Miller <davem@...emloft.net>,
"Tzalik, Guy" <gtzalik@...zon.com>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Toke Høiland-Jørgensen
<toke@...hat.com>, "Kiyanovski, Arthur" <akiyano@...zon.com>,
Alexei Starovoitov <ast@...nel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
David Ahern <dsahern@...il.com>, brouer@...hat.com
Subject: Re: XDP multi-buffer design discussion
On Wed, 18 Dec 2019 16:03:54 +0000
"Jubran, Samih" <sameehj@...zon.com> wrote:
> > -----Original Message-----
> > From: Luigi Rizzo <rizzo@....unipi.it>
> > Sent: Wednesday, December 18, 2019 12:30 AM
> > To: Jesper Dangaard Brouer <brouer@...hat.com>
> > Cc: Jubran, Samih <sameehj@...zon.com>; Machulsky, Zorik
> > <zorik@...zon.com>; Daniel Borkmann <borkmann@...earbox.net>; David
> > Miller <davem@...emloft.net>; Tzalik, Guy <gtzalik@...zon.com>; Ilias
> > Apalodimas <ilias.apalodimas@...aro.org>; Toke Høiland-Jørgensen
> > <toke@...hat.com>; Kiyanovski, Arthur <akiyano@...zon.com>; Alexei
> > Starovoitov <ast@...nel.org>; netdev@...r.kernel.org; David Ahern
> > <dsahern@...il.com>
> > Subject: Re: XDP multi-buffer design discussion
> >
> > On Tue, Dec 17, 2019 at 12:46 AM Jesper Dangaard Brouer
> > <brouer@...hat.com> wrote:
> > >
> > > On Mon, 16 Dec 2019 20:15:12 -0800
> > > Luigi Rizzo <rizzo@....unipi.it> wrote:
> > >...
> > > > For some use cases, the bpf program could deduct the total length
> > > > looking at the L3 header.
> > >
> > > Yes, that actually good insight. I guess the BPF-program could also
> > > use this to detect that it doesn't have access to the full-lineary
> > > packet this way(?)
> > >
> > > > It won't work for XDP_TX response though.
> > >
> > > The XDP_TX case also need to be discussed/handled. IMHO need to
> > > support XDP_TX for multi-buffer frames. XDP_TX *can* be driver
> > > specific, but most drivers choose to convert xdp_buff to
> > > xdp_frame, which makes it possible to use/share part of the
> > > XDP_REDIRECT code from ndo_xdp_xmit.
> > >
> > > We also need to handle XDP_REDIRECT, which becomes challenging, as the
> > > ndo_xdp_xmit functions of *all* drivers need to be updated (or
> > > introduce a flag to handle this incrementally).
> >
> > Here is a possible course of action (please let me know if you find
> > loose ends)
> >
> > 1. extend struct xdp_buff with a total length and sk_buff * (NULL by default);
> >
I don't like extending xdp_buff with an skb pointer. (Remember xdp_buff
is only "allocated" on the stack).
Perhaps xdp_buff with a total-length, *BUT* this can also be solved in
other ways. Extend xdp_buff with a flag indicating that this xdp_buff
have multiple-buffers (segments). Then we know that the area we store
the segments is valid, and we can also store the total-length there.
(We need this total-length later after xdp_buff is freed)
To make life easier for BPF-developers, we could export/create a
BPF-helper bpf_xdp_total_len(ctx), that hide whether segments are there
or not.
> > 2. add a netdev callback to construct the skb for the current
> > packet. This code obviously already in all drivers, just needs to
> > be exposed as function callable by a bpf helper (next bullet);
> >
Today we already have functions that create an SKB from an xdp_frame.
First the xdp_buff is converted to an xdp_frame, which memory area
lives in the top (32 bytes) of the packet-page.
(See function call convert_to_xdp_frame() in include/net/xdp.h).
Today two function create an SKB from this xdp_frame, and they should
likely be consolidated. Functions: (1) cpu_map_build_skb and (2)
veth_xdp_rcv_one (and dive into veth_build_skb). (Hint both use a
variant of build_skb).
The challenge for you, Samih, is the placement of skb_shared_info
within the packet-page memory area. These two xdp_frame->SKB
functions, place skb_shared_info after xdp_frame->len packet len
(aligned). IMHO the placement of skb_shared_info should NOT vary this
much. Instead this should be xdp->data_hard_end - 320 bytes (size of
skb_shared_info). First challenge is fixing this...
> > 3. add a new helper 'bpf_create_skb' that when invoked calls the
> > previously mentioned netdev callback to constructs an skb for
> > the current packet, and sets the pointer in the xdp_buff, if not
> > there already. A bpf program that needs to access segments beyond
> > the first one can call bpf_create_skb() if needed, and then use
> > existing helpers skb_load_bytes, skb_store_bytes, etc) to access
> > the skb.
> >
I respectfully disagree with this approach.
One reason is that we want to support creating SKBs on a remote CPU.
Like we do today with CPUMAP redirect. The proposed helper 'bpf_create_skb'
implies that this happens in BPF-context, during the NAPI loop.
> > My rationale is that if we need to access multiple segments, we
> > are already in an expensive territory and it makes little sense to
> > define a multi segment format that would essentially be an skb.
> >
I also disagree. Multi-egment frames also have some high speed
use-cases. Especially HW header-split at IP+TCP boundry is
interesting.
> >
> > 4. implement a mechanism to let so the driver know whether the
> > currently loaded bpf program understands the new format. There
> > are multiple ways to do that, a trivial one would be to check,
> > during load, that the program calls some known helper eg
> > bpf_understands_fragments() which is then jit-ed to somethijng
> > inexpensive
> >
> > Note that today, a netdev that cannot guarantee single segment
> > packets would not be able to enable xdp. Hence, without loss of
> > functionality, such netdev can refuse to load a program without
> > bpf_undersdands_fragments().
> >
> >
> > With all the above, the generic xdp handler would do the following:
> > if (!skb_is_linear() && !bpf_understands_fragments()) {
> > < linearize skb>;
> > }
> > <construct xdp_buff with first segment and skb> // skb is unused by old style programs
> > <call bpf program>
> >
> > The native driver for a device that cannot guarantee a single
> > segment would just refuse to load a program that does not
> > understand them (same as today), so the code would be:
> >
> > <construct xdp_buff with first segment and empty skb> <call bpf program>
> >
> > On return, we might find that an skb has been built by the xdp
> > program, and can be immediately used for XDP_PASS (or dropped in
> > case of XDP_DROP)
I also disagree here. SKB should first be created AFTER we know if
this will be a XDP_PASS action.
> > For XDP_TX and XDP_REDIRECT, something similar:
> > if the packet is a single segment and there is no skb, use the
> > existing accelerated path. If there are multiple segments,
> > construct the skb if not existing already, and pass it to the
> > standard tx path.
> >
> > cheers
> > luigi
>
> I have went over your suggestion, it looks good to me! I couldn't
> find any loose ends. One thing to note is that the driver now needs
> to save the context of the currently processed packet in for each
> queue so that it can support the netdev callback for creating the skb.
>
> This sounds a bit messy, but I think it should work.
>
> I'd love to hear more thoughts on this,
> Jesper, Toke what do you guys think?
As you can see from my comments I (respectfully) disagree with this
approach.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists