[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ikmle9t4.fsf@cloudflare.com>
Date: Wed, 30 Apr 2025 21:19:51 +0200
From: Jakub Sitnicki <jakub@...udflare.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>, Toke
Høiland-Jørgensen <toke@...hat.com>, Arthur Fabre
<arthur@...hurfabre.com>
Cc: Network Development <netdev@...r.kernel.org>, bpf
<bpf@...r.kernel.org>, Jesper Dangaard Brouer <hawk@...nel.org>, Yan
Zhai <yan@...udflare.com>, jbrandeburg@...udflare.com,
lbiancon@...hat.com, Alexei Starovoitov <ast@...nel.org>, Jakub
Kicinski <kuba@...nel.org>, Eric Dumazet <edumazet@...gle.com>,
kernel-team@...udflare.com
Subject: Re: [PATCH RFC bpf-next v2 01/17] trait: limited KV store for
packet metadata
On Wed, Apr 30, 2025 at 11:19 AM +02, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@...il.com> writes:
>
>> On Fri, Apr 25, 2025 at 12:27 PM Arthur Fabre <arthur@...hurfabre.com> wrote:
>>>
>>> On Thu Apr 24, 2025 at 6:22 PM CEST, Alexei Starovoitov wrote:
>>> > On Tue, Apr 22, 2025 at 6:23 AM Arthur Fabre <arthur@...hurfabre.com> wrote:
[...]
>>> * Hardware metadata: metadata exposed from NICs (like the receive
>>> timestamp, 4 tuple hash...) is currently only exposed to XDP programs
>>> (via kfuncs).
>>> But that doesn't expose them to the rest of the stack.
>>> Storing them in traits would allow XDP, other BPF programs, and the
>>> kernel to access and modify them (for example to into account
>>> decapsulating a packet).
>>
>> Sure. If traits == existing metadata bpf prog in xdp can communicate
>> with bpf prog in skb layer via that "trait" format.
>> xdp can take tuple hash and store it as key==0 in the trait.
>> The kernel doesn't need to know how to parse that format.
>
> Yes it does, to propagate it to the skb later. I.e.,
>
> XDP prog on NIC: get HW hash, store in traits, redirect to CPUMAP
> CPUMAP: build skb, read hash from traits, populate skb hash
>
> Same thing for (at least) timestamps and checksums.
>
> Longer term, with traits available we could move more skb fields into
> traits to make struct sk_buff smaller (by moving optional fields to
> traits that don't take up any space if they're not set).
Perhaps we can have the cake and eat it too.
We could leave the traits encoding/decoding out of the kernel and, at
the same time, *expose it* to the network stack through BPF struct_ops
programs. At a high level, for example ->get_rx_hash(), not the
individual K/V access. The traits_ops vtable could grow as needed to
support new use cases.
If you think about it, it's not so different from BPF-powered congestion
algorithms and scheduler extensions. They also expose some state, kept in
maps, that only the loaded BPF code knows how to operate on.
Powered by blists - more mailing lists