lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <878qdltsg0.fsf@cloudflare.com>
Date: Sun, 25 Jan 2026 20:15:11 +0100
From: Jakub Sitnicki <jakub@...udflare.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: netdev@...r.kernel.org,  "David S. Miller" <davem@...emloft.net>,  Eric
 Dumazet <edumazet@...gle.com>,  Paolo Abeni <pabeni@...hat.com>,  Simon
 Horman <horms@...nel.org>,  Michael Chan <michael.chan@...adcom.com>,
  Pavan Chebbi <pavan.chebbi@...adcom.com>,  Andrew Lunn
 <andrew+netdev@...n.ch>,  Tony Nguyen <anthony.l.nguyen@...el.com>,
  Przemek Kitszel <przemyslaw.kitszel@...el.com>,  Saeed Mahameed
 <saeedm@...dia.com>,  Leon Romanovsky <leon@...nel.org>,  Tariq Toukan
 <tariqt@...dia.com>,  Mark Bloch <mbloch@...dia.com>,  Alexei Starovoitov
 <ast@...nel.org>,  Daniel Borkmann <daniel@...earbox.net>,  Jesper
 Dangaard Brouer <hawk@...nel.org>,  John Fastabend
 <john.fastabend@...il.com>,  Stanislav Fomichev <sdf@...ichev.me>,
  intel-wired-lan@...ts.osuosl.org,  bpf@...r.kernel.org,
  kernel-team@...udflare.com,  Jakub Kicinski <kuba@...nel.org>,  Amery
 Hung <ameryhung@...il.com>
Subject: Re: [PATCH net-next 00/10] Call skb_metadata_set when skb->data
 points past metadata

On Thu, Jan 22, 2026 at 12:21 PM -08, Martin KaFai Lau wrote:
> On 1/13/26 4:33 AM, Jakub Sitnicki wrote:
>> Good point. I'm hoping we don't have to allocate from
>> skb_metadata_set(), which does sound prohibitively expensive. Instead
>> we'd allocate the extension together with the skb if we know upfront
>> that metadata will be used.
>
> [ Sorry for being late. Have been catching up after holidays. ]
>
> For the sk local storage (which was mentioned in other replies as making skb
> metadata to look more like sk local storage), there is a plan (Amery has been
> looking into it) to allocate the storage together with sk for performance
> reason. This means allocating a larger 'struct sock'. The extra space will be at
> the front of sk instead of the end of sk because of how the 'struct sock' is
> embedded in tcp_sock/udp_sock/... If skb is going in the same direction, it
> should be useful to have a similar scheme on: upfront allocation and then shared
> by multiple BPF progs.
>
> The current thinking is to built upon the existing bpf_sk_local_storage usage. A
> boot param decides how much BPF space should be allocated for 'struct
> sock'. When a bpf_sk_storage_map is created (with a new use_reserve flag), the
> space will be allocated permanently from the head space of every sk for this
> map. The read (from a BPF prog) will be at one stable offset before a sk. If
> there is no more head space left, the map creation will fail. User can decide if
> it wants to retry without the 'use_reserve' flag.

Thanks for sharing the plans.

We will definitely be looking into ways of eliminating allocations in
the long run. With one allocation for skb_ext, one for
bpf_local_storage, and one for the actual map, it seems unlikely we will
be able to attach metadata this way to every packet. Which is something
we wanted for our "label packet once, use label everywhere" use case.

I'm not sure how much we can squeeze in together with the sk_buff.
Hopefully at least skb_ext plus a pointer to bpf_local_storage.

I'm also hoping we can allocate memory for bpf_local_storage together
with the backing space for the map, which update triggers the skb
extension activation.

Finally, bpf_local_storage itself has a pretty generous cache which
blows it up. Maybe the cache could be a flexible array, which could be
smaller for skb local storage.

All just ideas ATM. Initial RFC won't have any of these optimizations.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ