[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <221b8f60-0644-4744-93dc-a46a68411270@redhat.com>
Date: Mon, 22 Sep 2025 11:41:12 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski
<kuba@...nel.org>, Simon Horman <horms@...nel.org>,
Willem de Bruijn <willemb@...gle.com>, Kuniyuki Iwashima
<kuniyu@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH v3 net-next] udp: remove busylock and add per NUMA queues
On 9/22/25 11:34 AM, Eric Dumazet wrote:
> On Mon, Sep 22, 2025 at 1:47 AM Eric Dumazet <edumazet@...gle.com> wrote:
>> On Mon, Sep 22, 2025 at 1:37 AM Paolo Abeni <pabeni@...hat.com> wrote:
>>> On 9/21/25 11:58 AM, Eric Dumazet wrote:
>>>> @@ -1718,14 +1699,23 @@ static int udp_rmem_schedule(struct sock *sk, int size)
>>>> int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
>>>> {
>>>> struct sk_buff_head *list = &sk->sk_receive_queue;
>>>> + struct udp_prod_queue *udp_prod_queue;
>>>> + struct sk_buff *next, *to_drop = NULL;
>>>> + struct llist_node *ll_list;
>>>> unsigned int rmem, rcvbuf;
>>>> - spinlock_t *busy = NULL;
>>>> int size, err = -ENOMEM;
>>>> + int total_size = 0;
>>>> + int q_size = 0;
>>>> + int nb = 0;
>>>>
>>>> rmem = atomic_read(&sk->sk_rmem_alloc);
>>>> rcvbuf = READ_ONCE(sk->sk_rcvbuf);
>>>> size = skb->truesize;
>>>>
>>>> + udp_prod_queue = &udp_sk(sk)->udp_prod_queue[numa_node_id()];
>>>> +
>>>> + rmem += atomic_read(&udp_prod_queue->rmem_alloc);
>>>> +
>>>> /* Immediately drop when the receive queue is full.
>>>> * Cast to unsigned int performs the boundary check for INT_MAX.
>>>> */
>>>
>>> Double checking I'm reading the code correctly... AFAICS the rcvbuf size
>>> check is now only per NUMA node, that means that each node can now add
>>> at most sk_rcvbuf bytes to the socket receive queue simultaneously, am I
>>> correct?
>>
>> This is a transient condition. In my tests with 6 NUMA nodes pushing
>> packets very hard,
>> I was not able to see a significant bump of sk_rmem_alloc (over sk_rcvbuf)
>>
>>
>>
>>>
>>> What if the user-space process never reads the packets (or is very
>>> slow)? I'm under the impression the max rcvbuf occupation will be
>>> limited only by the memory accounting?!? (and not by sk_rcvbuf)
>>
>> Well, as soon as sk->sk_rmem_alloc is bigger than sk_rcvbuf, all
>> further incoming packets are dropped.
>>
>> As you said, memory accounting is there.
>>
>> This could matter if we had thousands of UDP sockets under flood at
>> the same time,
>> but that would require thousands of cpus and/or NIC rx queues.
>>
>>
>>
>>>
>>> Side note: I'm wondering if we could avoid the numa queue for connected
>>> sockets? With early demux, and no nft/bridge in between the path from
>>> NIC to socket should be pretty fast and possibly the additional queuing
>>> visible?
>>
>> I tried this last week and got no difference in performance on my test machines.
>>
>> I can retry this and give you precise numbers before sending V4.
>
> I did my experiment again.
>
> Very little difference (1 or 2 %, but would need many runs to have a
> confirmation)
>
> Also loopback traffic would be unprotected (Only RSS on a physical NIC
> would properly use a single cpu for all packets)
>
> Looking at the performance profile of the cpus
[...]
Indeed delta looks in the noise range, thanks for checking.
Just in case there is any doubt:
Acked-by: Paolo Abeni <pabeni@...hat.com>
Powered by blists - more mailing lists