netdev - Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Oct 2022 11:29:17 +0200
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     sdf@...gle.com
Cc:     brouer@...hat.com, bpf@...r.kernel.org, netdev@...r.kernel.org,
        xdp-hints@...-project.net, larysa.zaremba@...el.com,
        memxor@...il.com, Lorenzo Bianconi <lorenzo@...nel.org>,
        mtahhan@...hat.com,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Daniel Borkmann <borkmann@...earbox.net>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>,
        dave@...cker.co.uk, Magnus Karlsson <magnus.karlsson@...el.com>,
        bjorn@...nel.org
Subject: Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW
 offload hints via BTF


On 04/10/2022 01.55, sdf@...gle.com wrote:
> On 09/07, Jesper Dangaard Brouer wrote:
>> This patchset expose the traditional hardware offload hints to XDP and
>> rely on BTF to expose the layout to users.
> 
>> Main idea is that the kernel and NIC drivers simply defines the struct
>> layouts they choose to use for XDP-hints. These XDP-hints structs gets
>> naturally and automatically described via BTF and implicitly exported to
>> users. NIC drivers populate and records their own BTF ID as the last
>> member in XDP metadata area (making it easily accessible by AF_XDP
>> userspace at a known negative offset from packet data start).
> 
>> Naming conventions for the structs (xdp_hints_*) is used such that
>> userspace can find and decode the BTF layout and match against the
>> provided BTF IDs. Thus, no new UAPI interfaces are needed for exporting
>> what XDP-hints a driver supports.
> 
>> The patch "i40e: Add xdp_hints_union" introduce the idea of creating a
>> union named "xdp_hints_union" in every driver, which contains all
>> xdp_hints_* struct this driver can support. This makes it easier/quicker
>> to find and parse the relevant BTF types.  (Seeking input before fixing
>> up all drivers in patchset).
> 
> 
>> The main different from RFC-v1:
>>   - Drop idea of BTF "origin" (vmlinux, module or local)
>>   - Instead to use full 64-bit BTF ID that combine object+type ID
> 
>> I've taken some of Alexandr/Larysa's libbpf patches and integrated
>> those.
> 
>> Patchset exceeds netdev usually max 15 patches rule. My excuse is three
>> NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and which
>> required some refactoring to remove the SKB dependencies.
> 
> Hey Jesper,
> 
> I took a quick look at the series. 
Appreciate that! :-)

> Do we really need the enum with the flags?

The primary reason for using enum is that these gets exposed as BTF.
The proposal is that userspace/BTF need to obtain the flags via BTF,
such that they don't become UAPI, but something we can change later.

> We might eventually hit that "first 16 bits are reserved" issue?
> 
> Instead of exposing enum with the flags, why not solve it as follows:
> a. We define UAPI struct xdp_rx_hints with _all_ possible hints

How can we know _all_ possible hints from the beginning(?).

UAPI + central struct dictating all possible hints, will limit innovation.

> b. Each device defines much denser <device>_xdp_rx_hints struct with the
>     metadata that it supports

Thus, the NIC device is limited to what is defined in UAPI struct
xdp_rx_hints.  Again this limits innovation.

> c. The subset of fields in <device>_xdp_rx_hints should match the ones from
>     xdp_rx_hints (we essentially standardize on the field names/sizes)
> d. We expose <device>_xdp_rx_hints btf id via netlink for each device

For this proposed design you would still need more than one BTF ID or
<device>_xdp_rx_hints struct's, because not all packets contains all
hints. The most common case is HW timestamping, which some HW only
supports for PTP frames.

Plus, I don't see a need to expose anything via netlink, as we can just
use the existing BTF information from the module.  Thus, avoiding to
creating more UAPI.

> e. libbpf will query and do offset relocations for
>     xdp_rx_hints -> <device>_xdp_rx_hints at load time
> 
> Would that work? Then it seems like we can replace bitfields with the 

I used to be a fan of bitfields, until I discovered that they are bad
for performance, because compilers cannot optimize these.

> following:
> 
>    if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
>      /* use that hint */

Fairly often a VLAN will not be set in packets, so we still have to read
and check a bitfield/flag if the VLAN value is valid. (Guess it is
implicit in above code).

>    }
> 
> All we need here is for libbpf to, again, do xdp_rx_hints ->
> <device>_xdp_rx_hints translation before it evaluates 
> bpf_core_field_exists()?
> 
> Thoughts? Any downsides? Am I missing something?
> 

Well, the downside is primarily that this design limits innovation.

Each time a NIC driver want to introduce a new hardware hint, they have
to update the central UAPI xdp_rx_hints struct first.

The design in the patchset is to open for innovation.  Driver can extend
their own xdp_hints_<driver>_xxx struct(s).  They still have to land
their patches upstream, but avoid mangling a central UAPI struct. As
upstream we review driver changes and should focus on sane struct member
naming(+size) especially if this "sounds" like a hint/feature that more
driver are likely to support.  With help from BTF relocations, a new
driver can support same hint/feature if naming(+size) match (without
necessary the same offset in the struct).

> Also, about the TX side: I feel like the same can be applied there,
> the program works with xdp_tx_hints and libbpf will rewrite to
> <device>_xdp_tx_hints. xdp_tx_hints might have fields like "has_tx_vlan:1";
> those, presumably, can be relocatable by libbpf as well?
> 

Good to think ahead for TX-side, even-though I think we should focus on
landing RX-side first.

I notice your naming xdp_rx_hints vs. xdp_tx_hints.  I have named the
common struct xdp_hints_common, without a RX/TX direction indication.
Maybe this is wrong of me, but my thinking was that most of the common
hints can be directly used as TX-side hints.  I'm hoping TX-side
xdp-hints will need to do little-to-non adjustment, before using the
hints as TX "instruction".  I'm hoping that XDP-redirect will just work
and xmit driver can use XDP-hints area.

Please correct me if I'm wrong.
The checksum fields hopefully translates to similar TX offload "actions".
The VLAN offload hint should translate directly to TX-side.

I can easily be convinced we should name it xdp_hints_rx_common from the
start, but then I will propose that xdp_hints_tx_common have the
checksum and VLAN fields+flags at same locations, such that we don't
take any performance hint for moving them to "TX-side" hints, making
XDP-redirect just work.

--Jesper