netdev - Re: [PATCH net-next 09/10] udp: make busylock per socket

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <willemdebruijn.kernel.e4b37db8cf47@gmail.com>
Date: Tue, 16 Sep 2025 12:31:43 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Eric Dumazet <edumazet@...gle.com>, 
 "David S . Miller" <davem@...emloft.net>, 
 Jakub Kicinski <kuba@...nel.org>, 
 Paolo Abeni <pabeni@...hat.com>
Cc: Simon Horman <horms@...nel.org>, 
 Willem de Bruijn <willemb@...gle.com>, 
 Kuniyuki Iwashima <kuniyu@...gle.com>, 
 David Ahern <dsahern@...nel.org>, 
 netdev@...r.kernel.org, 
 eric.dumazet@...il.com, 
 Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net-next 09/10] udp: make busylock per socket

Eric Dumazet wrote:
> While having all spinlocks packed into an array was a space saver,
> this also caused NUMA imbalance and hash collisions.
> 
> UDPv6 socket size becomes 1600 after this patch.
> 
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
>  include/linux/udp.h |  1 +
>  include/net/udp.h   |  1 +
>  net/ipv4/udp.c      | 20 ++------------------
>  3 files changed, 4 insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/udp.h b/include/linux/udp.h
> index 6ed008ab166557e868c1918daaaa5d551b7989a7..e554890c4415b411f35007d3ece9e6042db7a544 100644
> --- a/include/linux/udp.h
> +++ b/include/linux/udp.h
> @@ -109,6 +109,7 @@ struct udp_sock {
>  	 */
>  	struct hlist_node	tunnel_list;
>  	struct numa_drop_counters drop_counters;
> +	spinlock_t		busylock ____cacheline_aligned_in_smp;
>  };
>  
>  #define udp_test_bit(nr, sk)			\
> diff --git a/include/net/udp.h b/include/net/udp.h
> index a08822e294b038c0d00d4a5f5cac62286a207926..eecd64097f91196897f45530540b9c9b68c5ba4e 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -289,6 +289,7 @@ static inline void udp_lib_init_sock(struct sock *sk)
>  	struct udp_sock *up = udp_sk(sk);
>  
>  	sk->sk_drop_counters = &up->drop_counters;
> +	spin_lock_init(&up->busylock);
>  	skb_queue_head_init(&up->reader_queue);
>  	INIT_HLIST_NODE(&up->tunnel_list);
>  	up->forward_threshold = sk->sk_rcvbuf >> 2;
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 25143f932447df2a84dd113ca33e1ccf15b3503c..7d1444821ee51a19cd5fd0dd5b8d096104c9283c 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1689,17 +1689,11 @@ static void udp_skb_dtor_locked(struct sock *sk, struct sk_buff *skb)
>   * to relieve pressure on the receive_queue spinlock shared by consumer.
>   * Under flood, this means that only one producer can be in line
>   * trying to acquire the receive_queue spinlock.
> - * These busylock can be allocated on a per cpu manner, instead of a
> - * per socket one (that would consume a cache line per socket)
>   */
> -static int udp_busylocks_log __read_mostly;
> -static spinlock_t *udp_busylocks __read_mostly;
> -
> -static spinlock_t *busylock_acquire(void *ptr)
> +static spinlock_t *busylock_acquire(struct sock *sk)
>  {
> -	spinlock_t *busy;
> +	spinlock_t *busy = &udp_sk(sk)->busylock;
>  
> -	busy = udp_busylocks + hash_ptr(ptr, udp_busylocks_log);
>  	spin_lock(busy);
>  	return busy;
>  }
> @@ -3997,7 +3991,6 @@ static void __init bpf_iter_register(void)
>  void __init udp_init(void)
>  {
>  	unsigned long limit;
> -	unsigned int i;
>  
>  	udp_table_init(&udp_table, "UDP");
>  	limit = nr_free_buffer_pages() / 8;
> @@ -4006,15 +3999,6 @@ void __init udp_init(void)
>  	sysctl_udp_mem[1] = limit;
>  	sysctl_udp_mem[2] = sysctl_udp_mem[0] * 2;
>  
> -	/* 16 spinlocks per cpu */
> -	udp_busylocks_log = ilog2(nr_cpu_ids) + 4;
> -	udp_busylocks = kmalloc(sizeof(spinlock_t) << udp_busylocks_log,
> -				GFP_KERNEL);

A per sock busylock is preferable over increasing this array to be
full percpu (and converting percpu to avoid false sharing)?

Because that would take a lot of space on modern server platforms?
Just trying to understand the trade-off made.

> -	if (!udp_busylocks)
> -		panic("UDP: failed to alloc udp_busylocks\n");
> -	for (i = 0; i < (1U << udp_busylocks_log); i++)
> -		spin_lock_init(udp_busylocks + i);
> -
>  	if (register_pernet_subsys(&udp_sysctl_ops))
>  		panic("UDP: failed to init sysctl parameters.\n");
>  
> -- 
> 2.51.0.384.g4c02a37b29-goog
>