netdev - Re: [RFC] udp: some improvements on RX path.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 05 Dec 2016 07:54:05 -0800
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Paolo Abeni <pabeni@...hat.com>, netdev <netdev@...r.kernel.org>
Subject: Re: [RFC] udp: some improvements on RX path.

On Mon, 2016-12-05 at 16:37 +0100, Jesper Dangaard Brouer wrote:

> Do you think the splice technique would, have the same performance
> benefit as having a MPMC queue with separate enqueue and dequeue locking?
> (like we have with skb_array/ptr_ring that avoids cache bouncing)?

I believe ring buffers make sense for critical points in the kernel,
but for an arbitrary number of TCP/UDP sockets in a host, they are a big
increase of memory, and a practical problem when SO_RCVBUF is changed,
since dynamic resize of the ring buffer would be needed.

If you think about it, most sockets have few outstanding packets, like
0, 1 , 2. But they also might have ~100 packets, sometimes...

For most of TCP/UDP sockets, a linked list is simply good enough.
( We only very recently converted the out of order receive queue to an
RB tree )

Now, if _two_ linked list are also good in the very rare case of floods,
I would use two linked lists, if they can offer us a 50 % increase at
small memory cost.

Then for very special cases, we have af_packet which should be optimized
for all the fancy stuff.

If an application really receives more than 1.5 Mpps per UDP socket,
then the author should seriously consider SO_REUSEPORT, and have more
than 1 vcpu on its VM. I think we have cheap cloud offers available from
many providers.

The ring buffer queue might make sense in net/core/dev.c, since
we currently have 2 queues per cpu.

So you might want to experiment with that, because it looks like we
might go to a model where a single cpu is (busypoll) processing all low
level RX processing from a single queue per NUMA node, then dispatch to
other cpus the IP/{TCP|UDP} processing.