netdev - Re: [PATCH net-next v2 3/3] xsk: remove spin lock protection of cached

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4c645223-8c52-40d3-889b-f3cf7fa09f89@redhat.com>
Date: Thu, 27 Nov 2025 12:29:19 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Jason Xing <kerneljasonxing@...il.com>, davem@...emloft.net,
 edumazet@...gle.com, kuba@...nel.org, bjorn@...nel.org,
 magnus.karlsson@...el.com, maciej.fijalkowski@...el.com,
 jonathan.lemon@...il.com, sdf@...ichev.me, ast@...nel.org,
 daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org,
 Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v2 3/3] xsk: remove spin lock protection of
 cached_prod

On 11/25/25 9:54 AM, Jason Xing wrote:
> From: Jason Xing <kernelxing@...cent.com>
> 
> Remove the spin lock protection along with some functions adjusted.
> 
> Now cached_prod is fully converted to atomic, which improves the
> performance by around 5% over different platforms.

I must admit that I'm surprised of the above delta; AFAIK replacing 1to1
spinlock with atomic should not impact performances measurably, as the
thread should still see the same contention, and will use the same
number of atomic operation on the bus.

> @@ -585,11 +574,9 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
>  	spin_unlock_irqrestore(&pool->cq_prod_lock, flags);
>  }
>  
> -static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n)
> +static void xsk_cq_cached_prod_cancel(struct xsk_buff_pool *pool, u32 n)
>  {
> -	spin_lock(&pool->cq_cached_prod_lock);
>  	atomic_sub(n, &pool->cq->cached_prod_atomic);

It looks like that the spinlock and the protected data are on different
structs.

I wild guess/suspect the real gain comes from avoiding touching an
additional cacheline.
`struct xsk_queue` size is 48 bytes and such struct is allocated via
kmalloc. Adding up to 16 bytes there will not change the slub used and
thus the actual memory usage.

I think that moving the cq_cached* spinlock(s) in xsk_queue should give
the same gain, with much less code churn. Could you please have a look
at such option?

/P