netdev - Re: [PATCH v4 net-next] udp: remove busylock and add per NUMA queues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <willemdebruijn.kernel.af97f0e88745@gmail.com>
Date: Mon, 22 Sep 2025 08:52:32 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Eric Dumazet <edumazet@...gle.com>, 
 "David S . Miller" <davem@...emloft.net>, 
 Jakub Kicinski <kuba@...nel.org>, 
 Paolo Abeni <pabeni@...hat.com>
Cc: Simon Horman <horms@...nel.org>, 
 Willem de Bruijn <willemb@...gle.com>, 
 Kuniyuki Iwashima <kuniyu@...gle.com>, 
 netdev@...r.kernel.org, 
 eric.dumazet@...il.com, 
 Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH v4 net-next] udp: remove busylock and add per NUMA queues

Eric Dumazet wrote:
> busylock was protecting UDP sockets against packet floods,
> but unfortunately was not protecting the host itself.
> 
> Under stress, many cpus could spin while acquiring the busylock,
> and NIC had to drop packets. Or packets would be dropped
> in cpu backlog if RPS/RFS were in place.
> 
> This patch replaces the busylock by intermediate
> lockless queues. (One queue per NUMA node).
> 
> This means that fewer number of cpus have to acquire
> the UDP receive queue lock.
> 
> Most of the cpus can either:
> - immediately drop the packet.
> - or queue it in their NUMA aware lockless queue.
> 
> Then one of the cpu is chosen to process this lockless queue
> in a batch.
> 
> The batch only contains packets that were cooked on the same
> NUMA node, thus with very limited latency impact.
> 
> Tested:
> 
> DDOS targeting a victim UDP socket, on a platform with 6 NUMA nodes
> (Intel(R) Xeon(R) 6985P-C)
> 
> Before:
> 
> nstat -n ; sleep 1 ; nstat | grep Udp
> Udp6InDatagrams                 1004179            0.0
> Udp6InErrors                    3117               0.0
> Udp6RcvbufErrors                3117               0.0
> 
> After:
> nstat -n ; sleep 1 ; nstat | grep Udp
> Udp6InDatagrams                 1116633            0.0
> Udp6InErrors                    14197275           0.0
> Udp6RcvbufErrors                14197275           0.0
> 
> We can see this host can now proces 14.2 M more packets per second
> while under attack, and the victim socket can receive 11 % more
> packets.
> 
> I used a small bpftrace program measuring time (in us) spent in
> __udp_enqueue_schedule_skb().
> 
> Before:
> 
> @udp_enqueue_us[398]:
> [0]                24901 |@@@                                                 |
> [1]                63512 |@@@@@@@@@                                           |
> [2, 4)            344827 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [4, 8)            244673 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                |
> [8, 16)            54022 |@@@@@@@@                                            |
> [16, 32)          222134 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                   |
> [32, 64)          232042 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                  |
> [64, 128)           4219 |                                                    |
> [128, 256)           188 |                                                    |
> 
> After:
> 
> @udp_enqueue_us[398]:
> [0]              5608855 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [1]              1111277 |@@@@@@@@@@                                          |
> [2, 4)            501439 |@@@@                                                |
> [4, 8)            102921 |                                                    |
> [8, 16)            29895 |                                                    |
> [16, 32)           43500 |                                                    |
> [32, 64)           31552 |                                                    |
> [64, 128)            979 |                                                    |
> [128, 256)            13 |                                                    |
> 
> Note that the remaining bottleneck for this platform is in
> udp_drops_inc() because we limited struct numa_drop_counters
> to only two nodes so far.
> 
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Acked-by: Paolo Abeni <pabeni@...hat.com>

Reviewed-by: Willem de Bruijn <willemb@...gle.com>