netdev - Re: [PATCH v2 bpf] xsk: fix immature cq descriptor production

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <beaab8ec-d11a-4147-b7f4-487a4c3fe45b@intel.com>
Date: Tue, 8 Jul 2025 17:54:10 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
CC: Stanislav Fomichev <stfomichev@...il.com>, <bpf@...r.kernel.org>,
	<ast@...nel.org>, <daniel@...earbox.net>, <andrii@...nel.org>,
	<netdev@...r.kernel.org>, <magnus.karlsson@...el.com>, Eryk Kubanski
	<e.kubanski@...tner.samsung.com>
Subject: Re: [PATCH v2 bpf] xsk: fix immature cq descriptor production

From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Date: Tue, 8 Jul 2025 16:14:39 +0200

> On Mon, Jul 07, 2025 at 11:40:48AM -0700, Stanislav Fomichev wrote:
>> On 07/07, Alexander Lobakin wrote:

[...]

>>> BTW isn't num_descs from that new structure would be the same as
>>> shinfo->nr_frags + 1 (or just nr_frags for xsk_build_skb_zerocopy())?
>>
>> So you're saying we don't need to store it? Agreed. But storing the rest
>> in cb still might be problematic with kconfig-configurable MAX_SKB_FRAGS?

For sure skb->cb is too small for 17+ u64s.

> 
> Hi Stan & Olek,
> 
> no, as said in v1 drivers might linearize the skb and all frags will be
> lost. This storage is needed unfortunately.

Aaah sorry. In this case yeah, you need this separate frag count.

> 
>>
>>>> Can we pre-allocate an array of xsk_addrs during xsk_bind (the number of
>>>> xsk_addrs is bound by the tx ring size)? Then we can remove the alloc on tx
>>>> and replace it with some code to manage that pool of xsk_addrs..
> 
> That would be pool-bound which makes it a shared resource so I believe
> that we would repeat the problem being fixed here ;)

Except the system Page Pool idea right below maybe :>

> 
>>>
>>> Nice idea BTW.
>>>
>>> We could even use system per-cpu Page Pools to allocate these structs*
>>> :D It wouldn't waste 1 page per one struct as PP is frag-aware and has
>>> API for allocating only a small frag.
>>>
>>> Headroom stuff was also ok to me: we either way allocate a new skb, so
>>> we could allocate it with a bit bigger headroom and put that table there
>>> being sure that nobody will overwrite it (some drivers insert special
>>> headers or descriptors in front of the actual skb->data).
> 
> headroom approach was causing one of bpf selftests to fail, but I didn't
> check in-depth the reason. I didn't really like the check in destructor if
> addr array was corrupted in v1 and I came up with v2 which seems to me a
> cleaner fix.
> 
>>>
>>> [*] Offtop: we could also use system PP to allocate skbs in
>>> xsk_build_skb() just like it's done in xdp_build_skb_from_zc() +
>>> xdp_copy_frags_from_zc() -- no way to avoid memcpy(), but the payload
>>> buffers would be recycled then.
>>
>> Or maybe kmem_cache_alloc_node with a custom cache is good enough?
>> Headroom also feels ok if we store the whole xsk_addrs struct in it.
> 
> Yep both of these approaches was something I considered, but keep in mind
> it's a bugfix so I didn't want to go with something flashy. I have not
> observed big performance impact but I checked only MAX_SKB_FRAGS being set
> to standard value.
> 
> Would you guys be ok if I do the follow-up with possible optimization
> after my vacation which would be a -next candidate?

As a fix, it's totally fine for me to go in the current form, sure.

> 
> Thanks,
> MF

Thanks,
Olek