linux-kernel - RE: [PATCH net-next] hv_netvsc: don't make assumptions on struct flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN1PR0301MB077010E0AC22812F390C14CACACC0@BN1PR0301MB0770.namprd03.prod.outlook.com>
Date:	Thu, 14 Jan 2016 18:35:10 +0000
From:	Haiyang Zhang <haiyangz@...rosoft.com>
To:	Eric Dumazet <eric.dumazet@...il.com>,
	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>
CC:	Tom Herbert <tom@...bertland.com>,
	David Miller <davem@...emloft.net>,
	"vkuznets@...hat.com" <vkuznets@...hat.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	KY Srinivasan <kys@...rosoft.com>,
	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH net-next] hv_netvsc: don't make assumptions on struct
 flow_keys layout

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@...il.com]
> Sent: Thursday, January 14, 2016 1:24 PM
> To: One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>
> Cc: Tom Herbert <tom@...bertland.com>; Haiyang Zhang
> <haiyangz@...rosoft.com>; David Miller <davem@...emloft.net>;
> vkuznets@...hat.com; netdev@...r.kernel.org; KY Srinivasan
> <kys@...rosoft.com>; devel@...uxdriverproject.org; linux-
> kernel@...r.kernel.org
> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
> struct flow_keys layout
> 
> On Thu, 2016-01-14 at 17:53 +0000, One Thousand Gnomes wrote:
> > > These results for Toeplitz are not plausible. Given random input you
> > > cannot expect any hash function to produce such uniform results. I
> > > suspect either your input data is biased or how your applying the
> hash
> > > is.
> > >
> > > When I run 64 random IPv4 3-tuples through Toeplitz and Jenkins I
> get
> > > something more reasonable:
> >
> > IPv4 address patterns are not random. Nothing like it. A long long
> time
> > ago we did do a bunch of tuning for network hashes using big porn site
> > data sets. Random it was not.
> >
> 
> I ran my tests with non random IPV4 addresses, as I had 2 hosts,
> one server, one client. (typical benchmark stuff)
> 
> The only 'random' part was the ports, so maybe ~20 bits of entropy,
> considering how we allocate ports during connect() to a given
> destination to avoid port reuse.
> 
> > It's probably hard to repeat that exercise now with geo specific
> routing,
> > and all the front end caches and redirectors on big sites but I'd
> > strongly suggest random input is not a good test, and also that you
> need
> > to worry more about hash attacks than perfect distributions.
> 
> Anyway, the exercise is not to find a hash that exactly splits 128 flows
> into 16 buckets, according to the number of flows per bucket.
> 
> Maybe only 4 flows are sending at 3Gbits, and others are sending at 100
> kbits. There is no way the driver can predict the future.
> 
> This is why we prefer to select a queue given the cpu sending the
> packet. This permits a natural shift based on actual load, and is the
> default on linux (see XPS in Documentation/networking/scaling.txt)
> 
> Only this driver has a selection based on a flow 'hash'.

Also, the port number selection may not be random either. For example, 
the well-known network throughput test tool, iperf, use port numbers with 
equal increment among them. We tested these non-random cases, and found 
the Toeplitz hash has distributed evenly, but Jenkins hash has non-even 
distribution.

I'm aware of the test from Tom Herbert <tom@...bertland.com>, which 
showing similar results of Toeplitz v.s. Jenkins with random inputs.

In summary, the Toeplitz performs better in case of non-random inputs, 
and performs similar to Jenkins in random inputs (which may not be the 
case in real world). So we still prefer to use Toeplitz hash.

To minimize the computational overhead, we may consider put the hash 
in a per-connection cache in TCP layer, so it only needs one time 
computation. But, even with the computation overhead at this moment, 
the throughput based on Toeplitz hash is better than Jenkins:
Throughput (Gbps) comparison:
#conn		Toeplitz	Jenkins
32		26.6		23.2
64		32.1		23.4
128		29.1		24.1

Also, to the questions from Eric Dumazet <eric.dumazet@...il.com> -- no, 
there is not limit of the number of connections per VMBus channel. But, 
if one channel has a lot more connections than other channels, the 
unbalanced work load slow down the overall throughput.

The purpose of send-indirection-table is to shift the workload by change 
the mapping of table entry v.s. the channel. The updated table is sent 
by host to guest from time to time. But if the hash function distributes 
too many connections into one table entry, it cannot spread them into 
different channels.

Thanks to everyone who joined the discussion.

Thanks,
- Haiyang