lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c645223-8c52-40d3-889b-f3cf7fa09f89@redhat.com>
Date: Thu, 27 Nov 2025 12:29:19 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Jason Xing <kerneljasonxing@...il.com>, davem@...emloft.net,
 edumazet@...gle.com, kuba@...nel.org, bjorn@...nel.org,
 magnus.karlsson@...el.com, maciej.fijalkowski@...el.com,
 jonathan.lemon@...il.com, sdf@...ichev.me, ast@...nel.org,
 daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org,
 Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v2 3/3] xsk: remove spin lock protection of
 cached_prod

On 11/25/25 9:54 AM, Jason Xing wrote:
> From: Jason Xing <kernelxing@...cent.com>
> 
> Remove the spin lock protection along with some functions adjusted.
> 
> Now cached_prod is fully converted to atomic, which improves the
> performance by around 5% over different platforms.

I must admit that I'm surprised of the above delta; AFAIK replacing 1to1
spinlock with atomic should not impact performances measurably, as the
thread should still see the same contention, and will use the same
number of atomic operation on the bus.


> @@ -585,11 +574,9 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
>  	spin_unlock_irqrestore(&pool->cq_prod_lock, flags);
>  }
>  
> -static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n)
> +static void xsk_cq_cached_prod_cancel(struct xsk_buff_pool *pool, u32 n)
>  {
> -	spin_lock(&pool->cq_cached_prod_lock);
>  	atomic_sub(n, &pool->cq->cached_prod_atomic);

It looks like that the spinlock and the protected data are on different
structs.

I wild guess/suspect the real gain comes from avoiding touching an
additional cacheline.
`struct xsk_queue` size is 48 bytes and such struct is allocated via
kmalloc. Adding up to 16 bytes there will not change the slub used and
thus the actual memory usage.

I think that moving the cq_cached* spinlock(s) in xsk_queue should give
the same gain, with much less code churn. Could you please have a look
at such option?

/P


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ