[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iK5-WQ-geM6nzz_WOBwc8_jt7HQUqXbm_eDceydvf0FJQ@mail.gmail.com>
Date: Fri, 22 Aug 2025 06:56:03 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Balazs Scheidler <bazsi77@...il.com>
Cc: netdev@...r.kernel.org, pabeni@...hat.com
Subject: Re: [RFC, RESEND] UDP receive path batching improvement
On Fri, Aug 22, 2025 at 6:33 AM Balazs Scheidler <bazsi77@...il.com> wrote:
>
> On Fri, Aug 22, 2025 at 06:10:28AM -0700, Eric Dumazet wrote:
> > On Fri, Aug 22, 2025 at 5:56 AM Balazs Scheidler <bazsi77@...il.com> wrote:
> > >
> > > On Fri, Aug 22, 2025 at 02:37:28AM -0700, Eric Dumazet wrote:
> > > > On Fri, Aug 22, 2025 at 2:15 AM Balazs Scheidler <bazsi77@...il.com> wrote:
> > > > >
> > > > > On Fri, Aug 22, 2025 at 01:18:36AM -0700, Eric Dumazet wrote:
> > > > > > On Fri, Aug 22, 2025 at 1:15 AM Balazs Scheidler <bazsi77@...il.com> wrote:
> > > > > > > The condition above uses "sk->sk_rcvbuf >> 2" as a trigger when the update is
> > > > > > > done to the counter.
> > > > > > >
> > > > > > > In our case (syslog receive path via udp), socket buffers are generally
> > > > > > > tuned up (in the order of 32MB or even more, I have seen 256MB as well), as
> > > > > > > the senders can generate spikes in their traffic and a lot of senders send
> > > > > > > to the same port. Due to latencies, sometimes these buffers take MBs of data
> > > > > > > before the user-space process even has a chance to consume them.
> > > > > > >
> > > > > >
> > > > > >
> > > > > > This seems very high usage for a single UDP socket.
> > > > > >
> > > > > > Have you tried SO_REUSEPORT to spread incoming packets to more sockets
> > > > > > (and possibly more threads) ?
> > > > >
> > > > > Yes. I use SO_REUSEPORT (16 sockets), I even use eBPF to distribute the
> > > > > load over multiple sockets evenly, instead of the normal load balancing
> > > > > algorithm built into SO_REUSEPORT.
> > > > >
> > > >
> > > > Great. But if you have many receive queues, are you sure this choice does not
> > > > add false sharing ?
> > >
> > > I am not sure how that could trigger false sharing here. I am using a
> > > "socket" filter, which generates a random number modulo the number of
> > > sockets:
> > >
> > > ```
> > > #include "vmlinux.h"
> > > #include <bpf/bpf_helpers.h>
> > >
> > > int number_of_sockets;
> > >
> > > SEC("socket")
> > > int random_choice(struct __sk_buff *skb)
> > > {
> > > if (number_of_sockets == 0)
> > > return -1;
> > >
> > > return bpf_get_prandom_u32() % number_of_sockets;
> > > }
> > > ```
> >
> > How many receive queues does your NIC have (ethtool -l eth0) ?
> >
> > This filter causes huge contention on the receive queues and various
> > socket fields, accessed by different cpus.
> >
> > You should instead perform a choice based on the napi_id (skb->napi_id)
>
> I don't have ssh access to the box, unfortunately. I'll look into napi_id,
> my historical knowledge of the IP stack is that we are using a single thread
> to handle incoming datagrams, but I have to realize that information did not
> age well. Also, the kernel is ancient, 4.18 something, RHEL8 (no, I didn't
> have a say in that...).
>
> This box is a VM, but I am not even sure about the virtualization stack used, I
> am finding it out the number of receive queues.
I think this is the critical part. The optimal eBPF program depends on this.
In anycase, the 25% threshold makes the usable capacity smaller,
so I would advise setting bigger SO_RCVBUF values.
Powered by blists - more mailing lists