lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 19 Apr 2016 18:12:57 +0100
From:	Edward Cree <ecree@...arflare.com>
To:	Tom Herbert <tom@...bertland.com>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	Linux Kernel Network Developers <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	<linux-net-drivers@...arflare.com>
Subject: Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv

On 19/04/16 16:46, Tom Herbert wrote:
> On Tue, Apr 19, 2016 at 7:50 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>> We have hard time to deal with latencies already, and maintaining some
>> sanity in the stack(s)
> Right, this is significant complexity for a fairly narrow use case.
Why do you say the use case is narrow?  This approach should increase
packet rate for any (non-GROed) traffic, whether for local delivery or
forwarding.  If you're line-rate limited, it'll save CPU time instead.
The only reason I focused my testing on single-byte UDP is because the
benefits are more easily measured in that case.

If anything, the use case is broader than GRO, because GRO can't be used
for datagram protocols where packet boundaries must be maintained.
And because the listified processing is at least partly sharing code with
the regular stack, it's less complexity than GRO which has to have
essentially its own receive stack, _and_ code to coalesce the results
back into a superframe.

I think if we pushed bundled RX all the way up to the TCP layer, it might
potentially also be faster than GRO, because it avoids the work of
coalescing superframes; plus going through the GRO callbacks for each
packet could end up blowing icache in the same way the regular stack does.
If bundling did prove faster, we could then remove GRO, and overall
complexity would be _reduced_.

But I admit it may be a long shot.

-Ed

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ