[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35fcfb25-583a-e923-6eee-e8bbcc19db17@redhat.com>
Date: Tue, 4 Oct 2022 11:29:17 +0200
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: sdf@...gle.com
Cc: brouer@...hat.com, bpf@...r.kernel.org, netdev@...r.kernel.org,
xdp-hints@...-project.net, larysa.zaremba@...el.com,
memxor@...il.com, Lorenzo Bianconi <lorenzo@...nel.org>,
mtahhan@...hat.com,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Daniel Borkmann <borkmann@...earbox.net>,
Andrii Nakryiko <andrii.nakryiko@...il.com>,
dave@...cker.co.uk, Magnus Karlsson <magnus.karlsson@...el.com>,
bjorn@...nel.org
Subject: Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW
offload hints via BTF
On 04/10/2022 01.55, sdf@...gle.com wrote:
> On 09/07, Jesper Dangaard Brouer wrote:
>> This patchset expose the traditional hardware offload hints to XDP and
>> rely on BTF to expose the layout to users.
>
>> Main idea is that the kernel and NIC drivers simply defines the struct
>> layouts they choose to use for XDP-hints. These XDP-hints structs gets
>> naturally and automatically described via BTF and implicitly exported to
>> users. NIC drivers populate and records their own BTF ID as the last
>> member in XDP metadata area (making it easily accessible by AF_XDP
>> userspace at a known negative offset from packet data start).
>
>> Naming conventions for the structs (xdp_hints_*) is used such that
>> userspace can find and decode the BTF layout and match against the
>> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
>> what XDP-hints a driver supports.
>
>> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
>> union named "xdp_hints_union" in every driver, which contains all
>> xdp_hints_* struct this driver can support. This makes it easier/quicker
>> to find and parse the relevant BTF types. (Seeking input before fixing
>> up all drivers in patchset).
>
>
>> The main different from RFC-v1:
>> - Drop idea of BTF "origin" (vmlinux, module or local)
>> - Instead to use full 64-bit BTF ID that combine object+type ID
>
>> I've taken some of Alexandr/Larysa's libbpf patches and integrated
>> those.
>
>> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
>> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
>> required some refactoring to remove the SKB dependencies.
>
> Hey Jesper,
>
> I took a quick look at the series.
Appreciate that! :-)
> Do we really need the enum with the flags?
The primary reason for using enum is that these gets exposed as BTF.
The proposal is that userspace/BTF need to obtain the flags via BTF,
such that they don't become UAPI, but something we can change later.
> We might eventually hit that "first 16 bits are reserved" issue?
>
> Instead of exposing enum with the flags, why not solve it as follows:
> a. We define UAPI struct xdp_rx_hints with _all_ possible hints
How can we know _all_ possible hints from the beginning(?).
UAPI + central struct dictating all possible hints, will limit innovation.
> b. Each device defines much denser <device>_xdp_rx_hints struct with the
> metadata that it supports
Thus, the NIC device is limited to what is defined in UAPI struct
xdp_rx_hints. Again this limits innovation.
> c. The subset of fields in <device>_xdp_rx_hints should match the ones from
> xdp_rx_hints (we essentially standardize on the field names/sizes)
> d. We expose <device>_xdp_rx_hints btf id via netlink for each device
For this proposed design you would still need more than one BTF ID or
<device>_xdp_rx_hints struct's, because not all packets contains all
hints. The most common case is HW timestamping, which some HW only
supports for PTP frames.
Plus, I don't see a need to expose anything via netlink, as we can just
use the existing BTF information from the module. Thus, avoiding to
creating more UAPI.
> e. libbpf will query and do offset relocations for
> xdp_rx_hints -> <device>_xdp_rx_hints at load time
>
> Would that work? Then it seems like we can replace bitfields with the
I used to be a fan of bitfields, until I discovered that they are bad
for performance, because compilers cannot optimize these.
> following:
>
> if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
> /* use that hint */
Fairly often a VLAN will not be set in packets, so we still have to read
and check a bitfield/flag if the VLAN value is valid. (Guess it is
implicit in above code).
> }
>
> All we need here is for libbpf to, again, do xdp_rx_hints ->
> <device>_xdp_rx_hints translation before it evaluates
> bpf_core_field_exists()?
>
> Thoughts? Any downsides? Am I missing something?
>
Well, the downside is primarily that this design limits innovation.
Each time a NIC driver want to introduce a new hardware hint, they have
to update the central UAPI xdp_rx_hints struct first.
The design in the patchset is to open for innovation. Driver can extend
their own xdp_hints_<driver>_xxx struct(s). They still have to land
their patches upstream, but avoid mangling a central UAPI struct. As
upstream we review driver changes and should focus on sane struct member
naming(+size) especially if this "sounds" like a hint/feature that more
driver are likely to support. With help from BTF relocations, a new
driver can support same hint/feature if naming(+size) match (without
necessary the same offset in the struct).
> Also, about the TX side: I feel like the same can be applied there,
> the program works with xdp_tx_hints and libbpf will rewrite to
> <device>_xdp_tx_hints. xdp_tx_hints might have fields like "has_tx_vlan:1";
> those, presumably, can be relocatable by libbpf as well?
>
Good to think ahead for TX-side, even-though I think we should focus on
landing RX-side first.
I notice your naming xdp_rx_hints vs. xdp_tx_hints. I have named the
common struct xdp_hints_common, without a RX/TX direction indication.
Maybe this is wrong of me, but my thinking was that most of the common
hints can be directly used as TX-side hints. I'm hoping TX-side
xdp-hints will need to do little-to-non adjustment, before using the
hints as TX "instruction". I'm hoping that XDP-redirect will just work
and xmit driver can use XDP-hints area.
Please correct me if I'm wrong.
The checksum fields hopefully translates to similar TX offload "actions".
The VLAN offload hint should translate directly to TX-side.
I can easily be convinced we should name it xdp_hints_rx_common from the
start, but then I will propose that xdp_hints_tx_common have the
checksum and VLAN fields+flags at same locations, such that we don't
take any performance hint for moving them to "TX-side" hints, making
XDP-redirect just work.
--Jesper
Powered by blists - more mailing lists