lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <243ea894-3bf3-4c10-b012-d4451e7ec17e@linux.dev>
Date: Tue, 27 Jan 2026 11:33:30 -0800
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Jakub Sitnicki <jakub@...udflare.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, Michael Chan <michael.chan@...adcom.com>,
 Pavan Chebbi <pavan.chebbi@...adcom.com>, Andrew Lunn
 <andrew+netdev@...n.ch>, Tony Nguyen <anthony.l.nguyen@...el.com>,
 Przemek Kitszel <przemyslaw.kitszel@...el.com>,
 Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>,
 Tariq Toukan <tariqt@...dia.com>, Mark Bloch <mbloch@...dia.com>,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 John Fastabend <john.fastabend@...il.com>,
 Stanislav Fomichev <sdf@...ichev.me>, intel-wired-lan@...ts.osuosl.org,
 bpf@...r.kernel.org, kernel-team@...udflare.com,
 Jakub Kicinski <kuba@...nel.org>, Amery Hung <ameryhung@...il.com>
Subject: Re: [PATCH net-next 00/10] Call skb_metadata_set when skb->data
 points past metadata



On 1/25/26 11:15 AM, Jakub Sitnicki wrote:
> On Thu, Jan 22, 2026 at 12:21 PM -08, Martin KaFai Lau wrote:
>> On 1/13/26 4:33 AM, Jakub Sitnicki wrote:
>>> Good point. I'm hoping we don't have to allocate from
>>> skb_metadata_set(), which does sound prohibitively expensive. Instead
>>> we'd allocate the extension together with the skb if we know upfront
>>> that metadata will be used.
>>
>> [ Sorry for being late. Have been catching up after holidays. ]
>>
>> For the sk local storage (which was mentioned in other replies as making skb
>> metadata to look more like sk local storage), there is a plan (Amery has been
>> looking into it) to allocate the storage together with sk for performance
>> reason. This means allocating a larger 'struct sock'. The extra space will be at
>> the front of sk instead of the end of sk because of how the 'struct sock' is
>> embedded in tcp_sock/udp_sock/... If skb is going in the same direction, it
>> should be useful to have a similar scheme on: upfront allocation and then shared
>> by multiple BPF progs.
>>
>> The current thinking is to built upon the existing bpf_sk_local_storage usage. A
>> boot param decides how much BPF space should be allocated for 'struct
>> sock'. When a bpf_sk_storage_map is created (with a new use_reserve flag), the
>> space will be allocated permanently from the head space of every sk for this
>> map. The read (from a BPF prog) will be at one stable offset before a sk. If
>> there is no more head space left, the map creation will fail. User can decide if
>> it wants to retry without the 'use_reserve' flag.
> 
> Thanks for sharing the plans.
> 
> We will definitely be looking into ways of eliminating allocations in
> the long run. With one allocation for skb_ext, one for
> bpf_local_storage, and one for the actual map, it seems unlikely we will
> be able to attach metadata this way to every packet. Which is something
> we wanted for our "label packet once, use label everywhere" use case.
> 
> I'm not sure how much we can squeeze in together with the sk_buff.
> Hopefully at least skb_ext plus a pointer to bpf_local_storage.

yeah, only a bpf_local_storage pointer is needed in skb (or in skb_ext). 
It is the same for the bpf sk/task/... storage.

To be clear, for allocation in skb, I was thinking more about Paolo's 
comment on "...increasing struct sk_buff size as an alternative to the 
mptcp skb extension...".

> 
> I'm also hoping we can allocate memory for bpf_local_storage together
> with the backing space for the map, which update triggers the skb
> extension activation.

Allocate the actual storage at the end of bpf_local_storage? Hmm... off 
the top of my head, I don't have a good idea how to do it without 
trading off flexibility. If trading off flexibility, may as well 
allocate fixed extra space at the sk (/skb) and get a performance 
benefit (which would need to be measured).

> 
> Finally, bpf_local_storage itself has a pretty generous cache which
> blows it up. Maybe the cache could be a flexible array, which could be
> smaller for skb local storage.

For our usage, the cache has been slowly filling up, so we actually have 
another side of the issue. Improvements on bpf_local_storage is always 
welcomed.

I am currently more interested in getting the extra memory/headroom 
allocated for an sk. Eventually, the storage(s) that will be needed for 
all (or most) sk will use the extra headroom of sk. The current 
bpf_local_storage (pointer) in sk will be more for testing/ad-hoc 
purpose or for performance-insensitive usage.

It is probably off topic now. It seems having extra tail space in a skb 
is not in your current plan for the next respin.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ