[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87cza43nlu.fsf@toke.dk>
Date: Thu, 03 Nov 2022 01:09:33 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Jesper Dangaard Brouer <jbrouer@...hat.com>,
Martin KaFai Lau <martin.lau@...ux.dev>, brouer@...hat.com,
"Bezdeka, Florian" <florian.bezdeka@...mens.com>,
"kuba@...nel.org" <kuba@...nel.org>,
"john.fastabend@...il.com" <john.fastabend@...il.com>,
"alexandr.lobakin@...el.com" <alexandr.lobakin@...el.com>,
"anatoly.burakov@...el.com" <anatoly.burakov@...el.com>,
"song@...nel.org" <song@...nel.org>,
"Deric, Nemanja" <nemanja.deric@...mens.com>,
"andrii@...nel.org" <andrii@...nel.org>,
"Kiszka, Jan" <jan.kiszka@...mens.com>,
"magnus.karlsson@...il.com" <magnus.karlsson@...il.com>,
"willemb@...gle.com" <willemb@...gle.com>,
"ast@...nel.org" <ast@...nel.org>, "yhs@...com" <yhs@...com>,
"kpsingh@...nel.org" <kpsingh@...nel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"mtahhan@...hat.com" <mtahhan@...hat.com>,
"xdp-hints@...-project.net" <xdp-hints@...-project.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"jolsa@...nel.org" <jolsa@...nel.org>,
"haoluo@...gle.com" <haoluo@...gle.com>
Subject: Re: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs
Stanislav Fomichev <sdf@...gle.com> writes:
> On Wed, Nov 2, 2022 at 3:02 PM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>>
>> Jesper Dangaard Brouer <jbrouer@...hat.com> writes:
>>
>> > On 01/11/2022 18.05, Martin KaFai Lau wrote:
>> >> On 10/31/22 6:59 PM, Stanislav Fomichev wrote:
>> >>> On Mon, Oct 31, 2022 at 3:57 PM Martin KaFai Lau
>> >>> <martin.lau@...ux.dev> wrote:
>> >>>>
>> >>>> On 10/31/22 10:00 AM, Stanislav Fomichev wrote:
>> >>>>>> 2. AF_XDP programs won't be able to access the metadata without
>> >>>>>> using a
>> >>>>>> custom XDP program that calls the kfuncs and puts the data into the
>> >>>>>> metadata area. We could solve this with some code in libxdp,
>> >>>>>> though; if
>> >>>>>> this code can be made generic enough (so it just dumps the available
>> >>>>>> metadata functions from the running kernel at load time), it may be
>> >>>>>> possible to make it generic enough that it will be forward-compatible
>> >>>>>> with new versions of the kernel that add new fields, which should
>> >>>>>> alleviate Florian's concern about keeping things in sync.
>> >>>>>
>> >>>>> Good point. I had to convert to a custom program to use the kfuncs :-(
>> >>>>> But your suggestion sounds good; maybe libxdp can accept some extra
>> >>>>> info about at which offset the user would like to place the metadata
>> >>>>> and the library can generate the required bytecode?
>> >>>>>
>> >>>>>> 3. It will make it harder to consume the metadata when building
>> >>>>>> SKBs. I
>> >>>>>> think the CPUMAP and veth use cases are also quite important, and that
>> >>>>>> we want metadata to be available for building SKBs in this path. Maybe
>> >>>>>> this can be resolved by having a convenient kfunc for this that can be
>> >>>>>> used for programs doing such redirects. E.g., you could just call
>> >>>>>> xdp_copy_metadata_for_skb() before doing the bpf_redirect, and that
>> >>>>>> would recursively expand into all the kfunc calls needed to extract
>> >>>>>> the
>> >>>>>> metadata supported by the SKB path?
>> >>>>>
>> >>>>> So this xdp_copy_metadata_for_skb will create a metadata layout that
>> >>>>
>> >>>> Can the xdp_copy_metadata_for_skb be written as a bpf prog itself?
>> >>>> Not sure where is the best point to specify this prog though.
>> >>>> Somehow during
>> >>>> bpf_xdp_redirect_map?
>> >>>> or this prog belongs to the target cpumap and the xdp prog
>> >>>> redirecting to this
>> >>>> cpumap has to write the meta layout in a way that the cpumap is
>> >>>> expecting?
>> >>>
>> >>> We're probably interested in triggering it from the places where xdp
>> >>> frames can eventually be converted into skbs?
>> >>> So for plain 'return XDP_PASS' and things like bpf_redirect/etc? (IOW,
>> >>> anything that's not XDP_DROP / AF_XDP redirect).
>> >>> We can probably make it magically work, and can generate
>> >>> kernel-digestible metadata whenever data == data_meta, but the
>> >>> question - should we?
>> >>> (need to make sure we won't regress any existing cases that are not
>> >>> relying on the metadata)
>> >>
>> >> Instead of having some kernel-digestible meta data, how about calling
>> >> another bpf prog to initialize the skb fields from the meta area after
>> >> __xdp_build_skb_from_frame() in the cpumap, so
>> >> run_xdp_set_skb_fileds_from_metadata() may be a better name.
>> >>
>> >
>> > I very much like this idea of calling another bpf prog to initialize the
>> > SKB fields from the meta area. (As a reminder, data need to come from
>> > meta area, because at this point the hardware RX-desc is out-of-scope).
>> > I'm onboard with xdp_copy_metadata_for_skb() populating the meta area.
>> >
>> > We could invoke this BPF-prog inside __xdp_build_skb_from_frame().
>> >
>> > We might need a new BPF_PROG_TYPE_XDP2SKB as this new BPF-prog
>> > run_xdp_set_skb_fields_from_metadata() would need both xdp_buff + SKB as
>> > context inputs. Right? (Not sure, if this is acceptable with the BPF
>> > maintainers new rules)
>> >
>> >> The xdp_prog@rx sets the meta data and then redirect. If the
>> >> xdp_prog@rx can also specify a xdp prog to initialize the skb fields
>> >> from the meta area, then there is no need to have a kfunc to enforce a
>> >> kernel-digestible layout. Not sure what is a good way to specify this
>> >> xdp_prog though...
>> >
>> > The challenge of running this (BPF_PROG_TYPE_XDP2SKB) BPF-prog inside
>> > __xdp_build_skb_from_frame() is that it need to know howto decode the
>> > meta area for every device driver or XDP-prog populating this (as veth
>> > and cpumap can get redirected packets from multiple device drivers).
>>
>> If we have the helper to copy the data "out of" the drivers, why do we
>> need a second BPF program to copy data to the SKB?
>>
>> I.e., the XDP program calls xdp_copy_metadata_for_skb(); this invokes
>> each of the kfuncs needed for the metadata used by SKBs, all of which
>> get unrolled. The helper takes the output of these metadata-extracting
>> kfuncs and stores it "somewhere". This "somewhere" could well be the
>> metadata area; but in any case, since it's hidden away inside a helper
>> (or kfunc) from the calling XDP program's PoV, the helper can just stash
>> all the data in a fixed format, which __xdp_build_skb_from_frame() can
>> then just read statically. We could even make this format match the
>> field layout of struct sk_buff, so all we have to do is memcpy a
>> contiguous chunk of memory when building the SKB.
>
> +1
>
> I'm currently doing exactly what you're suggesting (minus matching skb layout):
>
> struct xdp_to_skb_metadata {
> u32 magic; // randomized at boot
> ... skb-consumable-metadata in fixed format
> } __randomize_layout;
>
> bpf_xdp_copy_metadata_for_skb() does bpf_xdp_adjust_meta(ctx,
> -sizeof(struct xdp_to_skb_metadata)) and then calls a bunch of kfuncs
> to fill in the actual data.
>
> Then, at __xdp_build_skb_from_frame time, I'm having a regular kernel
> C code that parses that 'struct xdp_to_skb_metadata'.
> (To be precise, I'm trying to parse the metadata from
> skb_metadata_set; it's called from __xdp_build_skb_from_frame, but not
> 100% sure that's the right place).
> (I also randomize the layout and magic to make sure userspace doesn't
> depend on it because nothing stops this packet to be routed into xsk
> socket..)
Ah, nice trick with __randomize_layout - I agree we need to do something
to prevent userspace from inadvertently starting to rely on this, and
this seems like a great solution!
Look forward to seeing what the whole thing looks like in a more
complete form :)
-Toke
Powered by blists - more mailing lists