[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.af97f0e88745@gmail.com>
Date: Mon, 22 Sep 2025 08:52:32 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Eric Dumazet <edumazet@...gle.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Cc: Simon Horman <horms@...nel.org>,
Willem de Bruijn <willemb@...gle.com>,
Kuniyuki Iwashima <kuniyu@...gle.com>,
netdev@...r.kernel.org,
eric.dumazet@...il.com,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH v4 net-next] udp: remove busylock and add per NUMA queues
Eric Dumazet wrote:
> busylock was protecting UDP sockets against packet floods,
> but unfortunately was not protecting the host itself.
>
> Under stress, many cpus could spin while acquiring the busylock,
> and NIC had to drop packets. Or packets would be dropped
> in cpu backlog if RPS/RFS were in place.
>
> This patch replaces the busylock by intermediate
> lockless queues. (One queue per NUMA node).
>
> This means that fewer number of cpus have to acquire
> the UDP receive queue lock.
>
> Most of the cpus can either:
> - immediately drop the packet.
> - or queue it in their NUMA aware lockless queue.
>
> Then one of the cpu is chosen to process this lockless queue
> in a batch.
>
> The batch only contains packets that were cooked on the same
> NUMA node, thus with very limited latency impact.
>
> Tested:
>
> DDOS targeting a victim UDP socket, on a platform with 6 NUMA nodes
> (Intel(R) Xeon(R) 6985P-C)
>
> Before:
>
> nstat -n ; sleep 1 ; nstat | grep Udp
> Udp6InDatagrams 1004179 0.0
> Udp6InErrors 3117 0.0
> Udp6RcvbufErrors 3117 0.0
>
> After:
> nstat -n ; sleep 1 ; nstat | grep Udp
> Udp6InDatagrams 1116633 0.0
> Udp6InErrors 14197275 0.0
> Udp6RcvbufErrors 14197275 0.0
>
> We can see this host can now proces 14.2 M more packets per second
> while under attack, and the victim socket can receive 11 % more
> packets.
>
> I used a small bpftrace program measuring time (in us) spent in
> __udp_enqueue_schedule_skb().
>
> Before:
>
> @udp_enqueue_us[398]:
> [0] 24901 |@@@ |
> [1] 63512 |@@@@@@@@@ |
> [2, 4) 344827 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [4, 8) 244673 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [8, 16) 54022 |@@@@@@@@ |
> [16, 32) 222134 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [32, 64) 232042 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [64, 128) 4219 | |
> [128, 256) 188 | |
>
> After:
>
> @udp_enqueue_us[398]:
> [0] 5608855 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [1] 1111277 |@@@@@@@@@@ |
> [2, 4) 501439 |@@@@ |
> [4, 8) 102921 | |
> [8, 16) 29895 | |
> [16, 32) 43500 | |
> [32, 64) 31552 | |
> [64, 128) 979 | |
> [128, 256) 13 | |
>
> Note that the remaining bottleneck for this platform is in
> udp_drops_inc() because we limited struct numa_drop_counters
> to only two nodes so far.
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Acked-by: Paolo Abeni <pabeni@...hat.com>
Reviewed-by: Willem de Bruijn <willemb@...gle.com>
Powered by blists - more mailing lists