[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKH8qBtSVr8CQfSnKC_GdL80KTgj-+kykZ8chXnhqJ1pcL6ZTA@mail.gmail.com>
Date: Wed, 18 Jan 2023 09:55:30 -0800
From: Stanislav Fomichev <sdf@...gle.com>
To: Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: brouer@...hat.com, bpf@...r.kernel.org, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, martin.lau@...ux.dev,
song@...nel.org, yhs@...com, john.fastabend@...il.com,
kpsingh@...nel.org, haoluo@...gle.com, jolsa@...nel.org,
David Ahern <dsahern@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
Willem de Bruijn <willemb@...gle.com>,
Anatoly Burakov <anatoly.burakov@...el.com>,
Alexander Lobakin <alexandr.lobakin@...el.com>,
Magnus Karlsson <magnus.karlsson@...il.com>,
Maryam Tahhan <mtahhan@...hat.com>, xdp-hints@...-project.net,
netdev@...r.kernel.org, David Vernet <void@...ifault.com>,
Björn Töpel <bjorn@...nel.org>
Subject: Re: [PATCH bpf-next v7 01/17] bpf: Document XDP RX metadata
On Wed, Jan 18, 2023 at 6:28 AM Jesper Dangaard Brouer
<jbrouer@...hat.com> wrote:
>
>
> On 17/01/2023 21.33, Stanislav Fomichev wrote:
> > On Mon, Jan 16, 2023 at 5:09 AM Jesper Dangaard Brouer
> > <jbrouer@...hat.com> wrote:
> >>
> >> On 12/01/2023 01.32, Stanislav Fomichev wrote:
> >>> Document all current use-cases and assumptions.
> >>>
> [...]
> >>> Acked-by: David Vernet <void@...ifault.com>
> >>> Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> >>> ---
> >>> Documentation/networking/index.rst | 1 +
> >>> Documentation/networking/xdp-rx-metadata.rst | 108 +++++++++++++++++++
> >>> 2 files changed, 109 insertions(+)
> >>> create mode 100644 Documentation/networking/xdp-rx-metadata.rst
> >>>
> >>> diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
> >>> index 4f2d1f682a18..4ddcae33c336 100644
> >>> --- a/Documentation/networking/index.rst
> >>> +++ b/Documentation/networking/index.rst
> [...cut...]
> >>> +AF_XDP
> >>> +======
> >>> +
> >>> +:doc:`af_xdp` use-case implies that there is a contract between the BPF
> >>> +program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and
> >>> +the final consumer. Thus the BPF program manually allocates a fixed number of
> >>> +bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
> >>> +of kfuncs to populate it. The userspace ``XSK`` consumer computes
> >>> +``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata.
> >>> +Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and
> >>> +``METADATA_SIZE`` is an application-specific constant.
> >>
> >> The main problem with AF_XDP and metadata is that, the AF_XDP descriptor
> >> doesn't contain any info about the length METADATA_SIZE.
> >>
> >> The text does says this, but in a very convoluted way.
> >> I think this challenge should be more clearly spelled out.
> >>
> >> (p.s. This was something that XDP-hints via BTF have a proposed solution
> >> for)
> >
> > Any suggestions on how to clarify it better? I have two hints:
> > 1. ``METADATA_SIZE`` is an application-specific constant
> > 2. note missing ``data_meta`` pointer
> >
> > Do you prefer I also add a sentence where I spell it out more
> > explicitly? Something like:
> >
> > Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and
> > ``METADATA_SIZE`` is an application-specific constant (``AF_XDP``
> > receive descriptor does _not_ explicitly carry the size of the
> > metadata).
>
> That addition works for me.
> (Later we can hopefully come up with a solution for this)
>
> >>> +
> >>> +Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::
> >>
> >> The "note" also hint to this issue.
> >
> > This seems like an explicit design choice of the AF_XDP? In theory, I
> > don't see why we can't have a v2 receive descriptor format where we
> > return the size of the metadata?
>
> (Cc. Magnus+Bjørn)
> Yes, it was a design choice from AF_XDP not to include the metadata length.
>
> The AF_XDP descriptor, see struct xdp_desc (below) from
> /include/uapi/linux/if_xdp.h.
>
> /* Rx/Tx descriptor */
> struct xdp_desc {
> __u64 addr;
> __u32 len;
> __u32 options;
> };
>
> Does contain a 'u32 options' field, that we could use.
>
> In previous discussions, the proposed solution (from Bjørn+Magnus) was
> to use some bits in the 'options' field to say metadata is present, and
> xsk_umem__get_data minus 4 (or 8) bytes contain a BTF_ID. The AF_XDP
> programmer can then get the metadata length by looking up the BTF_ID.
Yeah, that's one way to do it. Instead of BTF_ID, we can just put the
size of the metadata.
But it wasn't needed for the use-cases described in this patchset.
For redirect use-case, I agree, we might carry some extra information
about the layout, up to you.
> >>> +
> >>> + +----------+-----------------+------+
> >>> + | headroom | custom metadata | data |
> >>> + +----------+-----------------+------+
> >>> + ^
> >>> + |
> >>> + rx_desc->address
> >>> +
> >>> +XDP_PASS
> >>> +========
> >>> +
> >>> +This is the path where the packets processed by the XDP program are passed
> >>> +into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff``
> >>> +contents. Currently, every driver has custom kernel code to parse
> >>> +the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb``
> >>> +conversion, and the XDP metadata is not used by the kernel when building
> >>> +``skbs``. However, TC-BPF programs can access the XDP metadata area using
> >>> +the ``data_meta`` pointer.
> >>> +
> >>> +In the future, we'd like to support a case where an XDP program
> >>> +can override some of the metadata used for building ``skbs``.
> >>
> >> Happy this is mentioned as future work.
> >
> > As mentioned in a separate email, if you prefer to focus on that, feel
>
> Yes, I'm going to work on PoC code that explore this area.
>
> > free to drive it since I'm gonna look into the TX side first.
>
> Happy to hear you are going to look into TX-side.
> Are your use case related to TX timestamping?
Yes, I'll try to start with a tx timestamp. But looking (briefly) at
the code, that seems like a more invasive addition.
If we want to return that tx timstamp via the original umem, some
completion mechanism has to be added (besides the existing one it).
LMK if you have some pointers to the previous discussions. Or maybe
Bjørn/Magnus had some plans/ideas about that?
> --Jesper
>
Powered by blists - more mailing lists