lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aQTBajODN3Nnskta@boxer>
Date: Fri, 31 Oct 2025 15:02:18 +0100
From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
To: Jason Xing <kerneljasonxing@...il.com>
CC: <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
	<pabeni@...hat.com>, <bjorn@...nel.org>, <magnus.karlsson@...el.com>,
	<jonathan.lemon@...il.com>, <sdf@...ichev.me>, <ast@...nel.org>,
	<daniel@...earbox.net>, <hawk@...nel.org>, <john.fastabend@...il.com>,
	<joe@...a.to>, <willemdebruijn.kernel@...il.com>, <fmancera@...e.de>,
	<csmate@....hu>, <bpf@...r.kernel.org>, <netdev@...r.kernel.org>, Jason Xing
	<kernelxing@...cent.com>
Subject: Re: [PATCH RFC net-next 2/2] xsk: introduce a cached cq to
 temporarily store descriptor addrs

On Fri, Oct 31, 2025 at 05:32:30PM +0800, Jason Xing wrote:
> From: Jason Xing <kernelxing@...cent.com>
> 
> Before the commit 30f241fcf52a ("xsk: Fix immature cq descriptor
> production"), there is one issue[1] which causes the wrong publish
> of descriptors in race condidtion. The above commit fixes the issue
> but adds more memory operations in the xmit hot path and interrupt
> context, which can cause side effect in performance.
> 
> This patch tries to propose a new solution to fix the problem
> without manipulating the allocation and deallocation of memory. One
> of the key points is that I borrowed the idea from the above commit
> that postpones updating the ring->descs in xsk_destruct_skb()
> instead of in __xsk_generic_xmit().
> 
> The core logics are as show below:
> 1. allocate a new local queue. Only its cached_prod member is used.
> 2. write the descriptors into the local queue in the xmit path. And
>    record the cached_prod as @start_addr that reflects the
>    start position of this queue so that later the skb can easily
>    find where its addrs are written in the destruction phase.
> 3. initialize the upper 24 bits of destructor_arg to store @start_addr
>    in xsk_skb_init_misc().
> 4. Initialize the lower 8 bits of destructor_arg to store how many
>    descriptors the skb owns in xsk_update_num_desc().
> 5. write the desc addr(s) from the @start_addr from the cached cq
>    one by one into the real cq in xsk_destruct_skb(). In turn sync
>    the global state of the cq.
> 
> The format of destructor_arg is designed as:
>  ------------------------ --------
> |       start_addr       |  num   |
>  ------------------------ --------
> Using upper 24 bits is enough to keep the temporary descriptors. And
> it's also enough to use lower 8 bits to show the number of descriptors
> that one skb owns.
> 
> [1]: https://lore.kernel.org/all/20250530095957.43248-1-e.kubanski@partner.samsung.com/
> 
> Signed-off-by: Jason Xing <kernelxing@...cent.com>
> ---
> I posted the series as an RFC because I'd like to hear more opinions on
> the current rought approach so that the fix[2] can be avoided and
> mitigate the impact of performance. This patch might have bugs because
> I decided to spend more time on it after we come to an agreement. Please
> review the overall concepts. Thanks!
> 
> Maciej, could you share with me the way you tested jumbo frame? I used
> ./xdpsock -i enp2s0f1 -t -q 1 -S -s 9728 but the xdpsock utilizes the
> nic more than 90%, which means I cannot see the performance impact.
> 
> [2]:https://lore.kernel.org/all/20251030140355.4059-1-fmancera@suse.de/
> ---
>  include/net/xdp_sock.h      |   1 +
>  include/net/xsk_buff_pool.h |   1 +
>  net/xdp/xsk.c               | 104 ++++++++++++++++++++++++++++--------
>  net/xdp/xsk_buff_pool.c     |   1 +
>  4 files changed, 84 insertions(+), 23 deletions(-)

(...)

> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index aa9788f20d0d..6e170107dec7 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -99,6 +99,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs,
>  
>  	pool->fq = xs->fq_tmp;
>  	pool->cq = xs->cq_tmp;
> +	pool->cached_cq = xs->cached_cq;

Jason,

pool can be shared between multiple sockets that bind to same <netdev,qid>
tuple. I believe here you're opening up for the very same issue Eryk
initially reported.

>  
>  	for (i = 0; i < pool->free_heads_cnt; i++) {
>  		xskb = &pool->heads[i];
> -- 
> 2.41.3
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ