netdev - Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <57166719.4070209@solarflare.com>
Date:	Tue, 19 Apr 2016 18:12:57 +0100
From:	Edward Cree <ecree@...arflare.com>
To:	Tom Herbert <tom@...bertland.com>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	<linux-net-drivers@...arflare.com>
Subject: Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv

On 19/04/16 16:46, Tom Herbert wrote:
> On Tue, Apr 19, 2016 at 7:50 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>> We have hard time to deal with latencies already, and maintaining some
>> sanity in the stack(s)
> Right, this is significant complexity for a fairly narrow use case.
Why do you say the use case is narrow?  This approach should increase
packet rate for any (non-GROed) traffic, whether for local delivery or
forwarding.  If you're line-rate limited, it'll save CPU time instead.
The only reason I focused my testing on single-byte UDP is because the
benefits are more easily measured in that case.

If anything, the use case is broader than GRO, because GRO can't be used
for datagram protocols where packet boundaries must be maintained.
And because the listified processing is at least partly sharing code with
the regular stack, it's less complexity than GRO which has to have
essentially its own receive stack, _and_ code to coalesce the results
back into a superframe.

I think if we pushed bundled RX all the way up to the TCP layer, it might
potentially also be faster than GRO, because it avoids the work of
coalescing superframes; plus going through the GRO callbacks for each
packet could end up blowing icache in the same way the regular stack does.
If bundling did prove faster, we could then remove GRO, and overall
complexity would be _reduced_.

But I admit it may be a long shot.

-Ed