netdev - Re: [PATCH v3 net-next] udp: remove busylock and add per NUMA queues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89i+s3kAc_h0kP=enN4jEp1x0HCLaAX4H+X3P=LBGjzjZTw@mail.gmail.com>
Date: Mon, 22 Sep 2025 03:29:16 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Simon Horman <horms@...nel.org>, Willem de Bruijn <willemb@...gle.com>, 
	Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH v3 net-next] udp: remove busylock and add per NUMA queues

On Mon, Sep 22, 2025 at 2:41 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On 9/22/25 11:34 AM, Eric Dumazet wrote:
> > On Mon, Sep 22, 2025 at 1:47 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >> On Mon, Sep 22, 2025 at 1:37 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >>> On 9/21/25 11:58 AM, Eric Dumazet wrote:
> >>>> @@ -1718,14 +1699,23 @@ static int udp_rmem_schedule(struct sock *sk, int size)
> >>>>  int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
> >>>>  {
> >>>>       struct sk_buff_head *list = &sk->sk_receive_queue;
> >>>> +     struct udp_prod_queue *udp_prod_queue;
> >>>> +     struct sk_buff *next, *to_drop = NULL;
> >>>> +     struct llist_node *ll_list;
> >>>>       unsigned int rmem, rcvbuf;
> >>>> -     spinlock_t *busy = NULL;
> >>>>       int size, err = -ENOMEM;
> >>>> +     int total_size = 0;
> >>>> +     int q_size = 0;
> >>>> +     int nb = 0;
> >>>>
> >>>>       rmem = atomic_read(&sk->sk_rmem_alloc);
> >>>>       rcvbuf = READ_ONCE(sk->sk_rcvbuf);
> >>>>       size = skb->truesize;
> >>>>
> >>>> +     udp_prod_queue = &udp_sk(sk)->udp_prod_queue[numa_node_id()];
> >>>> +
> >>>> +     rmem += atomic_read(&udp_prod_queue->rmem_alloc);
> >>>> +
> >>>>       /* Immediately drop when the receive queue is full.
> >>>>        * Cast to unsigned int performs the boundary check for INT_MAX.
> >>>>        */
> >>>
> >>> Double checking I'm reading the code correctly... AFAICS the rcvbuf size
> >>> check is now only per NUMA node, that means that each node can now add
> >>> at most sk_rcvbuf bytes to the socket receive queue simultaneously, am I
> >>> correct?
> >>
> >> This is a transient condition. In my tests with 6 NUMA nodes pushing
> >> packets very hard,
> >> I was not able to see a  significant bump of sk_rmem_alloc (over sk_rcvbuf)
> >>
> >>
> >>
> >>>
> >>> What if the user-space process never reads the packets (or is very
> >>> slow)? I'm under the impression the max rcvbuf occupation will be
> >>> limited only by the memory accounting?!? (and not by sk_rcvbuf)
> >>
> >> Well, as soon as sk->sk_rmem_alloc is bigger than sk_rcvbuf, all
> >> further incoming packets are dropped.
> >>
> >> As you said, memory accounting is there.
> >>
> >> This could matter if we had thousands of UDP sockets under flood at
> >> the same time,
> >> but that would require thousands of cpus and/or NIC rx queues.
> >>
> >>
> >>
> >>>
> >>> Side note: I'm wondering if we could avoid the numa queue for connected
> >>> sockets? With early demux, and no nft/bridge in between the path from
> >>> NIC to socket should be pretty fast and possibly the additional queuing
> >>> visible?
> >>
> >> I tried this last week and got no difference in performance on my test machines.
> >>
> >> I can retry this and give you precise numbers before sending V4.
> >
> > I did my experiment again.
> >
> > Very little difference (1 or 2 %, but would need many runs to have a
> > confirmation)
> >
> > Also loopback traffic would be unprotected (Only RSS on a physical NIC
> > would properly use a single cpu for all packets)
> >
> > Looking at the performance profile of the cpus
>
> [...]
>
> Indeed delta looks in the noise range, thanks for checking.
>
> Just in case there is any doubt:
>
> Acked-by: Paolo Abeni <pabeni@...hat.com>

Thanks for the review Paolo, I will include your Acked-by to V4.