[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1480953245.18162.541.camel@edumazet-glaptop3.roam.corp.google.com>
Date: Mon, 05 Dec 2016 07:54:05 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: Paolo Abeni <pabeni@...hat.com>, netdev <netdev@...r.kernel.org>
Subject: Re: [RFC] udp: some improvements on RX path.
On Mon, 2016-12-05 at 16:37 +0100, Jesper Dangaard Brouer wrote:
> Do you think the splice technique would, have the same performance
> benefit as having a MPMC queue with separate enqueue and dequeue locking?
> (like we have with skb_array/ptr_ring that avoids cache bouncing)?
I believe ring buffers make sense for critical points in the kernel,
but for an arbitrary number of TCP/UDP sockets in a host, they are a big
increase of memory, and a practical problem when SO_RCVBUF is changed,
since dynamic resize of the ring buffer would be needed.
If you think about it, most sockets have few outstanding packets, like
0, 1 , 2. But they also might have ~100 packets, sometimes...
For most of TCP/UDP sockets, a linked list is simply good enough.
( We only very recently converted the out of order receive queue to an
RB tree )
Now, if _two_ linked list are also good in the very rare case of floods,
I would use two linked lists, if they can offer us a 50 % increase at
small memory cost.
Then for very special cases, we have af_packet which should be optimized
for all the fancy stuff.
If an application really receives more than 1.5 Mpps per UDP socket,
then the author should seriously consider SO_REUSEPORT, and have more
than 1 vcpu on its VM. I think we have cheap cloud offers available from
many providers.
The ring buffer queue might make sense in net/core/dev.c, since
we currently have 2 queues per cpu.
So you might want to experiment with that, because it looks like we
might go to a model where a single cpu is (busypoll) processing all low
level RX processing from a single queue per NUMA node, then dispatch to
other cpus the IP/{TCP|UDP} processing.
Powered by blists - more mailing lists