netdev - Re: [PATCH bpf] xsk: fix immature cq descriptor production

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aGUSM6EncW/7j/B1@boxer>
Date: Wed, 2 Jul 2025 13:04:19 +0200
From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
To: <bpf@...r.kernel.org>, <ast@...nel.org>, <daniel@...earbox.net>,
	<andrii@...nel.org>
CC: <netdev@...r.kernel.org>, <magnus.karlsson@...el.com>, Eryk Kubanski
	<e.kubanski@...tner.samsung.com>
Subject: Re: [PATCH bpf] xsk: fix immature cq descriptor production

On Wed, Jul 02, 2025 at 12:16:48PM +0200, Maciej Fijalkowski wrote:
> Eryk reported an issue that I have put under Closes: tag, related to
> umem addrs being prematurely produced onto pool's completion queue.
> Let us make the skb's destructor responsible for producing all addrs
> that given skb used.
> 
> Commit from fixes tag introduced the buggy behavior, it was not broken
> from day 1, but rather when xsk multi-buffer got introduced.
> 
> Store addrs at the beginning of skb's linear part and have a sanity
> check if in any case driver would encapsulate headers in a way that data
> would overwrite the [head, head + sizeof(xdp_desc::addr) *
> (MAX_SKB_FRAGS + 1)] region, which we dedicate for umem addresses that
> will be produced onto xsk_buff_pool's completion queue.
> 
> This approach appears to survive scenario where underlying driver
> linearizes the skb because pskb_pull_tail() under the hood will copy
> header part to newly allocated memory. If this array would live in
> tailroom it would get overridden when pulling frags onto linear part.
> This happens when driver receives skb with frag count higher than what
> HW is able to swallow (I came across this case on ice driver that has
> maximum s/g count equal to 8).
> 
> Initially we also considered storing 8-byte addr at the end of page
> allocated by frag but xskxceiver has a test which writes full 4k to frag
> and this resulted in corrupted addr.
> 
> xsk_cq_submit_addr_locked() has to use xsk_get_num_desc() to find out
> frag count as skb that we deal with within destructor might not have the
> frags at all - as mentioned earlier drivers in their ndo_start_xmit()
> might linearize the skb. We will not use cached_prod to update
> producer's global state as its value might already have been increased,
> which would result in too many addresses being submitted onto cq.
> 
> Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
> Reported-by: Eryk Kubanski <e.kubanski@...tner.samsung.com>
> Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
> ---
>  net/xdp/xsk.c       | 92 +++++++++++++++++++++++++++++++--------------
>  net/xdp/xsk_queue.h | 12 ++++++
>  2 files changed, 75 insertions(+), 29 deletions(-)
> 

There's a CI failure regarding xsk metadata selftest which I didn't run on
my side, I focused on xdpsock+xskceiver, so I'll be taking a look into
that plus I think we can avoid skb headroom hack by allocating struct with
num_desc + addrs array and carry it via destructor_arg.