[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+RbuL9oknRn8uACRF-MMam=LvO6pVoR7BOUk=f5S6iVA@mail.gmail.com>
Date: Mon, 22 Sep 2025 01:47:08 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Simon Horman <horms@...nel.org>, Willem de Bruijn <willemb@...gle.com>,
Kuniyuki Iwashima <kuniyu@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH v3 net-next] udp: remove busylock and add per NUMA queues
On Mon, Sep 22, 2025 at 1:37 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> Hi,
>
> On 9/21/25 11:58 AM, Eric Dumazet wrote:
> > @@ -1718,14 +1699,23 @@ static int udp_rmem_schedule(struct sock *sk, int size)
> > int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
> > {
> > struct sk_buff_head *list = &sk->sk_receive_queue;
> > + struct udp_prod_queue *udp_prod_queue;
> > + struct sk_buff *next, *to_drop = NULL;
> > + struct llist_node *ll_list;
> > unsigned int rmem, rcvbuf;
> > - spinlock_t *busy = NULL;
> > int size, err = -ENOMEM;
> > + int total_size = 0;
> > + int q_size = 0;
> > + int nb = 0;
> >
> > rmem = atomic_read(&sk->sk_rmem_alloc);
> > rcvbuf = READ_ONCE(sk->sk_rcvbuf);
> > size = skb->truesize;
> >
> > + udp_prod_queue = &udp_sk(sk)->udp_prod_queue[numa_node_id()];
> > +
> > + rmem += atomic_read(&udp_prod_queue->rmem_alloc);
> > +
> > /* Immediately drop when the receive queue is full.
> > * Cast to unsigned int performs the boundary check for INT_MAX.
> > */
>
> Double checking I'm reading the code correctly... AFAICS the rcvbuf size
> check is now only per NUMA node, that means that each node can now add
> at most sk_rcvbuf bytes to the socket receive queue simultaneously, am I
> correct?
This is a transient condition. In my tests with 6 NUMA nodes pushing
packets very hard,
I was not able to see a significant bump of sk_rmem_alloc (over sk_rcvbuf)
>
> What if the user-space process never reads the packets (or is very
> slow)? I'm under the impression the max rcvbuf occupation will be
> limited only by the memory accounting?!? (and not by sk_rcvbuf)
Well, as soon as sk->sk_rmem_alloc is bigger than sk_rcvbuf, all
further incoming packets are dropped.
As you said, memory accounting is there.
This could matter if we had thousands of UDP sockets under flood at
the same time,
but that would require thousands of cpus and/or NIC rx queues.
>
> Side note: I'm wondering if we could avoid the numa queue for connected
> sockets? With early demux, and no nft/bridge in between the path from
> NIC to socket should be pretty fast and possibly the additional queuing
> visible?
I tried this last week and got no difference in performance on my test machines.
I can retry this and give you precise numbers before sending V4.
Powered by blists - more mailing lists