lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 21 Jan 2016 08:38:38 -0800
From:	Tom Herbert <tom@...bertland.com>
To:	Jesper Dangaard Brouer <brouer@...hat.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Or Gerlitz <gerlitz.or@...il.com>,
	David Miller <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Linux Netdev List <netdev@...r.kernel.org>,
	Alexander Duyck <alexander.duyck@...il.com>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Daniel Borkmann <borkmann@...earbox.net>,
	Marek Majkowski <marek@...udflare.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Florian Westphal <fw@...len.de>,
	Paolo Abeni <pabeni@...hat.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	Amir Vadai <amirva@...il.com>
Subject: Re: Optimizing instruction-cache, more packets at each stage

On Thu, Jan 21, 2016 at 4:23 AM, Jesper Dangaard Brouer
<brouer@...hat.com> wrote:
> On Wed, 20 Jan 2016 15:27:38 -0800
> Tom Herbert <tom@...bertland.com> wrote:
>
>> weaknesses of Toeplitz we talked about recently and that fact that
>> Jenkins is really fast to compute, I am starting to think maybe we
>> should always do a software hash and not rely on HW for it...
>
> Please don't enforce a software hash.  You are proposing a hash
> computation per packet which cost in the area 50-100 nanosec (?). And
> on data which is cache cold (even with DDIO, you take the L3 cache
> cost/hit).
>
I clock Jenkins hash computation itself at ~6nsecs (not taking cache
miss), but your point is taken.

> Consider the increase in network hardware speeds.
>
> Worst-case (pkt size 64 bytes) time between packets:
>  *  10 Gbit/s -> 67.2 nanosec
>  *  40 Gbit/s -> 16.8 nanosec
>  * 100 Gbit/s ->  6.7 nanosec
>
> Adding such a per packet cost is not going to fly.
>
Sure, but the receive path is parallelized. Improving parallelism has
continuously shown to have much more impact than attempting to
optimize for cache misses. The primary goal is not to drive 100Gbps
with 64 packets from a single CPU. It is one benchmark of many we
should look at to measure efficiency of the data path, but I've yet to
see any real workload that requires that...

Regardless of anything, we need to load packet headers into CPU cache
to do protocol processing. I'm not sure I see how trying to defer that
as long as possible helps except in cases where the packet is crossing
CPU cache boundaries and can eliminate cache misses completely (not
just move them around from one function to another).

Tom

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ