[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87ecnsv0g0.fsf@cloudflare.com>
Date: Wed, 14 Jan 2026 13:33:03 +0100
From: Jakub Sitnicki <jakub@...udflare.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: Jesper Dangaard Brouer <hawk@...nel.org>, Alexei Starovoitov
<ast@...nel.org>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, netdev@...r.kernel.org, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Simon Horman
<horms@...nel.org>, Michael Chan <michael.chan@...adcom.com>, Pavan
Chebbi <pavan.chebbi@...adcom.com>, Andrew Lunn <andrew+netdev@...n.ch>,
Tony Nguyen <anthony.l.nguyen@...el.com>, Przemek Kitszel
<przemyslaw.kitszel@...el.com>, Saeed Mahameed <saeedm@...dia.com>, Leon
Romanovsky <leon@...nel.org>, Tariq Toukan <tariqt@...dia.com>, Mark
Bloch <mbloch@...dia.com>, Daniel Borkmann <daniel@...earbox.net>, John
Fastabend <john.fastabend@...il.com>, Stanislav Fomichev
<sdf@...ichev.me>, intel-wired-lan@...ts.osuosl.org,
bpf@...r.kernel.org, kernel-team@...udflare.com, Jesse Brandeburg
<jbrandeburg@...udflare.com>, Willem Ferguson <wferguson@...udflare.com>,
Arthur Fabre <arthur@...hurfabre.com>
Subject: Re: [Intel-wired-lan] [PATCH net-next 00/10] Call skb_metadata_set
when skb->data points past metadata
On Wed, Jan 14, 2026 at 12:49 PM +01, Toke Høiland-Jørgensen wrote:
> Jakub Sitnicki via Intel-wired-lan <intel-wired-lan@...osl.org> writes:
>
>> On Tue, Jan 13, 2026 at 07:52 PM +01, Jesper Dangaard Brouer wrote:
>>> *BUT* this patchset isn't doing that. To me it looks like a cleanup
>>> patchset that simply makes it consistent when skb_metadata_set() called.
>>> Selling it as a pre-requirement for doing copy later seems fishy.
>>
>> Fair point on the framing. The interface cleanup is useful on its own -
>> I should have presented it that way rather than tying it to future work.
>>
>>> Instead of blindly copying XDP data_meta area into a single SKB
>>> extension. What if we make it the responsibility of the TC-ingress BPF-
>>> hook to understand the data_meta format and via (kfunc) helpers
>>> transfer/create the SKB extension that it deems relevant.
>>> Would this be an acceptable approach that makes it easier to propagate
>>> metadata deeper in netstack?
>>
>> I think you and Jakub are actually proposing the same thing.
>>
>> If we can access a buffer tied to an skb extension from BPF, this could
>> act as skb-local storage and solves the problem (with some operational
>> overhead to set up TC on ingress).
>>
>> I'd also like to get Alexei's take on this. We had a discussion before
>> about not wanting to maintain two different storage areas for skb
>> metadata.
>>
>> That was one of two reasons why we abandoned Arthur's patches and why I
>> tried to make the existing headroom-backed metadata area work.
>>
>> But perhaps I misunderstood the earlier discussion. Alexei's point may
>> have been that we don't want another *headroom-backed* metadata area
>> accessible from XDP, because we already have that.
>>
>> Looks like we have two options on the table:
>>
>> Option A) Headroom-backed metadata
>> - Use existing skb metadata area
>> - Patch skb_push/pull call sites to preserve it
>>
>> Option B) Extension-backed metadata
>> - Store metadata in skb extension from BPF
>> - TC BPF copies/extracts what it needs from headroom-metadata
>>
>> Or is there an Option C I'm missing?
>
> Not sure if it's really an option C, but would it be possible to
> consolidate them using verifier tricks? I.e., the data_meta field in the
> __sk_buff struct is really a virtual pointer that the verifier rewrites
> to loading an actual pointer from struct bpf_skb_data_end in skb->cb. So
> in principle this could be loaded from an skb extension instead with the
> BPF programs being none the wiser.
>
> There's the additional wrinkle that the end of the data_meta pointer is
> compared to the 'data' start pointer to check for overflow, which
> wouldn't work anymore. Not sure if there's a way to make the verifier
> rewrite those checks in a compatible way, or if this is even a path we
> want to go down. But it would be a pretty neat way to make the whole
> thing transparent and backwards compatible, I think :)
I gave it a shot when working on [1]. Here's the challenge:
1) Keep the skb->data_meta + N <= skb->data checks working
This is what guarantees that your BPF program won't access memory
outside of the metadata area. So you can't rewrite the skb->data_meta
pseudo-pointer load. This means you must...
2) Patch the skb->data_meta pointer dereference after the check
Since deref happens at some unknown point after the skb->data_meta
pointer load, you may no longer have the context pointer in any of the
registers.
You might be able to hack it by spilling the context pointer to the
stack in the prologue, like I've seen bpf_qdisc does. But that I haven't
tried.
In general, I view it as a seconary issue since you can use a BPF dynptr
to access the skb metadata. It was exactly for that reason - to hide the
fact where the metadata is actually located.
> Other than that, I like the extention-backed metadata idea!
That's what I'm going to work on. I look at it as an skb local storage
backed by an skb extension.
If the user wants to transfer the contents of the skb metadata into
local storage, they can. But the extra allocation is their decision.
[1] https://lore.kernel.org/r/20260110-skb-meta-fixup-skb_metadata_set-calls-v1-0-1047878ed1b0@cloudflare.com
Powered by blists - more mailing lists