netdev - Re: [PATCH v4 bpf] xsk: fix immature cq descriptor production

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aJ0EP7Dzl182WE7i@mini-arch>
Date: Wed, 13 Aug 2025 14:31:43 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: bpf@...r.kernel.org, ast@...nel.org, daniel@...earbox.net,
	andrii@...nel.org, netdev@...r.kernel.org,
	magnus.karlsson@...el.com, aleksander.lobakin@...el.com,
	Eryk Kubanski <e.kubanski@...tner.samsung.com>
Subject: Re: [PATCH v4 bpf] xsk: fix immature cq descriptor production

On 08/13, Maciej Fijalkowski wrote:
> Eryk reported an issue that I have put under Closes: tag, related to
> umem addrs being prematurely produced onto pool's completion queue.
> Let us make the skb's destructor responsible for producing all addrs
> that given skb used.
> 
> Introduce struct xsk_addrs which will carry descriptor count with array
> of addresses taken from processed descriptors that will be carried via
> skb_shared_info::destructor_arg. This way we can refer to it within
> xsk_destruct_skb(). In order to mitigate the overhead that will be
> coming from memory allocations, let us introduce kmem_cache of
> xsk_addrs, but be smart about scalability, as assigning unique cache per
> each socket might be expensive.
> 
> Store kmem_cache as percpu variables with embedded refcounting and let
> the xsk code index it by queue id. In case when a NIC#0 already runs
> copy mode xsk socket on queue #10 and user starts a socket on
> <NIC#1,qid#10> tuple, the kmem_cache is reused. Keep the pointer to
> kmem_cache in xdp_sock to avoid accessing bss in data path.
> 
> Commit from fixes tag introduced the buggy behavior, it was not broken
> from day 1, but rather when xsk multi-buffer got introduced.
> 
> Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
> Reported-by: Eryk Kubanski <e.kubanski@...tner.samsung.com>
> Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
> Acked-by: Magnus Karlsson <magnus.karlsson@...el.com>
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
> ---
> 
> v1:
> https://lore.kernel.org/bpf/20250702101648.1942562-1-maciej.fijalkowski@intel.com/
> v2:
> https://lore.kernel.org/bpf/20250705135512.1963216-1-maciej.fijalkowski@intel.com/
> v3:
> https://lore.kernel.org/bpf/20250806154127.2161434-1-maciej.fijalkowski@intel.com/
> 
> v1->v2:
> * store addrs in array carried via destructor_arg instead having them
>   stored in skb headroom; cleaner and less hacky approach;
> v2->v3:
> * use kmem_cache for xsk_addrs allocation (Stan/Olek)
> * set err when xsk_addrs allocation fails (Dan)
> * change xsk_addrs layout to avoid holes
> * free xsk_addrs on error path
> * rebase
> v3->v4:

[..]

> * have kmem_cache as percpu vars

I wonder whether we're over thinking it and should just have one "global"
kmem cache for xsk subsystem. The mm layer should, presumably,
do all that percpu stuff already internally. Have you seen any perf
issues with that approach?