lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+KGxhZNmFw8TsD9GzQ8=Acag_ALDw9AB5A4gupBpRzQQ@mail.gmail.com>
Date: Tue, 16 Sep 2025 10:10:17 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Willem de Bruijn <willemb@...gle.com>, Kuniyuki Iwashima <kuniyu@...gle.com>, David Ahern <dsahern@...nel.org>, 
	netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next 09/10] udp: make busylock per socket

On Tue, Sep 16, 2025 at 9:31 AM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Eric Dumazet wrote:
> > While having all spinlocks packed into an array was a space saver,
> > this also caused NUMA imbalance and hash collisions.
> >
> > UDPv6 socket size becomes 1600 after this patch.
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > ---
> >  include/linux/udp.h |  1 +
> >  include/net/udp.h   |  1 +
> >  net/ipv4/udp.c      | 20 ++------------------
> >  3 files changed, 4 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > index 6ed008ab166557e868c1918daaaa5d551b7989a7..e554890c4415b411f35007d3ece9e6042db7a544 100644
> > --- a/include/linux/udp.h
> > +++ b/include/linux/udp.h
> > @@ -109,6 +109,7 @@ struct udp_sock {
> >        */
> >       struct hlist_node       tunnel_list;
> >       struct numa_drop_counters drop_counters;
> > +     spinlock_t              busylock ____cacheline_aligned_in_smp;
> >  };
> >
> >  #define udp_test_bit(nr, sk)                 \
> > diff --git a/include/net/udp.h b/include/net/udp.h
> > index a08822e294b038c0d00d4a5f5cac62286a207926..eecd64097f91196897f45530540b9c9b68c5ba4e 100644
> > --- a/include/net/udp.h
> > +++ b/include/net/udp.h
> > @@ -289,6 +289,7 @@ static inline void udp_lib_init_sock(struct sock *sk)
> >       struct udp_sock *up = udp_sk(sk);
> >
> >       sk->sk_drop_counters = &up->drop_counters;
> > +     spin_lock_init(&up->busylock);
> >       skb_queue_head_init(&up->reader_queue);
> >       INIT_HLIST_NODE(&up->tunnel_list);
> >       up->forward_threshold = sk->sk_rcvbuf >> 2;
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 25143f932447df2a84dd113ca33e1ccf15b3503c..7d1444821ee51a19cd5fd0dd5b8d096104c9283c 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1689,17 +1689,11 @@ static void udp_skb_dtor_locked(struct sock *sk, struct sk_buff *skb)
> >   * to relieve pressure on the receive_queue spinlock shared by consumer.
> >   * Under flood, this means that only one producer can be in line
> >   * trying to acquire the receive_queue spinlock.
> > - * These busylock can be allocated on a per cpu manner, instead of a
> > - * per socket one (that would consume a cache line per socket)
> >   */
> > -static int udp_busylocks_log __read_mostly;
> > -static spinlock_t *udp_busylocks __read_mostly;
> > -
> > -static spinlock_t *busylock_acquire(void *ptr)
> > +static spinlock_t *busylock_acquire(struct sock *sk)
> >  {
> > -     spinlock_t *busy;
> > +     spinlock_t *busy = &udp_sk(sk)->busylock;
> >
> > -     busy = udp_busylocks + hash_ptr(ptr, udp_busylocks_log);
> >       spin_lock(busy);
> >       return busy;
> >  }
> > @@ -3997,7 +3991,6 @@ static void __init bpf_iter_register(void)
> >  void __init udp_init(void)
> >  {
> >       unsigned long limit;
> > -     unsigned int i;
> >
> >       udp_table_init(&udp_table, "UDP");
> >       limit = nr_free_buffer_pages() / 8;
> > @@ -4006,15 +3999,6 @@ void __init udp_init(void)
> >       sysctl_udp_mem[1] = limit;
> >       sysctl_udp_mem[2] = sysctl_udp_mem[0] * 2;
> >
> > -     /* 16 spinlocks per cpu */
> > -     udp_busylocks_log = ilog2(nr_cpu_ids) + 4;
> > -     udp_busylocks = kmalloc(sizeof(spinlock_t) << udp_busylocks_log,
> > -                             GFP_KERNEL);
>
> A per sock busylock is preferable over increasing this array to be
> full percpu (and converting percpu to avoid false sharing)?
>
> Because that would take a lot of space on modern server platforms?
> Just trying to understand the trade-off made.

The goal of the busylock is to have a single gate before sk->sk_receive_queue.

Having per-cpu spinlocks will not fit the need ?

Note that having per-NUMA receive queues is on my plate, but not finished yet.

I tried to remove the busylock (because modern UDP has a second queue
(up->reader_queue),
so __skb_recv_udp() splices things in batches), but busylock was still
beneficial.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ