netdev - Re: [PATCH net-next] hv_netvsc: don't make assumptions on struct flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 14 Jan 2016 11:41:02 -0800
From:	Tom Herbert <tom@...bertland.com>
To:	Haiyang Zhang <haiyangz@...rosoft.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	David Miller <davem@...emloft.net>,
	"vkuznets@...hat.com" <vkuznets@...hat.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	KY Srinivasan <kys@...rosoft.com>,
	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on struct
 flow_keys layout

On Thu, Jan 14, 2016 at 11:15 AM, Haiyang Zhang <haiyangz@...rosoft.com> wrote:
>
>
>> -----Original Message-----
>> From: Tom Herbert [mailto:tom@...bertland.com]
>> Sent: Thursday, January 14, 2016 1:49 PM
>> To: Haiyang Zhang <haiyangz@...rosoft.com>
>> Cc: Eric Dumazet <eric.dumazet@...il.com>; One Thousand Gnomes
>> <gnomes@...rguk.ukuu.org.uk>; David Miller <davem@...emloft.net>;
>> vkuznets@...hat.com; netdev@...r.kernel.org; KY Srinivasan
>> <kys@...rosoft.com>; devel@...uxdriverproject.org; linux-
>> kernel@...r.kernel.org
>> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
>> struct flow_keys layout
>>
>> On Thu, Jan 14, 2016 at 10:35 AM, Haiyang Zhang <haiyangz@...rosoft.com>
>> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Eric Dumazet [mailto:eric.dumazet@...il.com]
>> >> Sent: Thursday, January 14, 2016 1:24 PM
>> >> To: One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>
>> >> Cc: Tom Herbert <tom@...bertland.com>; Haiyang Zhang
>> >> <haiyangz@...rosoft.com>; David Miller <davem@...emloft.net>;
>> >> vkuznets@...hat.com; netdev@...r.kernel.org; KY Srinivasan
>> >> <kys@...rosoft.com>; devel@...uxdriverproject.org; linux-
>> >> kernel@...r.kernel.org
>> >> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
>> >> struct flow_keys layout
>> >>
>> >> On Thu, 2016-01-14 at 17:53 +0000, One Thousand Gnomes wrote:
>> >> > > These results for Toeplitz are not plausible. Given random input
>> you
>> >> > > cannot expect any hash function to produce such uniform results.
>> I
>> >> > > suspect either your input data is biased or how your applying the
>> >> hash
>> >> > > is.
>> >> > >
>> >> > > When I run 64 random IPv4 3-tuples through Toeplitz and Jenkins I
>> >> get
>> >> > > something more reasonable:
>> >> >
>> >> > IPv4 address patterns are not random. Nothing like it. A long long
>> >> time
>> >> > ago we did do a bunch of tuning for network hashes using big porn
>> site
>> >> > data sets. Random it was not.
>> >> >
>> >>
>> >> I ran my tests with non random IPV4 addresses, as I had 2 hosts,
>> >> one server, one client. (typical benchmark stuff)
>> >>
>> >> The only 'random' part was the ports, so maybe ~20 bits of entropy,
>> >> considering how we allocate ports during connect() to a given
>> >> destination to avoid port reuse.
>> >>
>> >> > It's probably hard to repeat that exercise now with geo specific
>> >> routing,
>> >> > and all the front end caches and redirectors on big sites but I'd
>> >> > strongly suggest random input is not a good test, and also that you
>> >> need
>> >> > to worry more about hash attacks than perfect distributions.
>> >>
>> >> Anyway, the exercise is not to find a hash that exactly splits 128
>> flows
>> >> into 16 buckets, according to the number of flows per bucket.
>> >>
>> >> Maybe only 4 flows are sending at 3Gbits, and others are sending at
>> 100
>> >> kbits. There is no way the driver can predict the future.
>> >>
>> >> This is why we prefer to select a queue given the cpu sending the
>> >> packet. This permits a natural shift based on actual load, and is the
>> >> default on linux (see XPS in Documentation/networking/scaling.txt)
>> >>
>> >> Only this driver has a selection based on a flow 'hash'.
>> >
>> > Also, the port number selection may not be random either. For example,
>> > the well-known network throughput test tool, iperf, use port numbers
>> with
>> > equal increment among them. We tested these non-random cases, and
>> found
>> > the Toeplitz hash has distributed evenly, but Jenkins hash has non-
>> even
>> > distribution.
>> >
>> > I'm aware of the test from Tom Herbert <tom@...bertland.com>, which
>> > showing similar results of Toeplitz v.s. Jenkins with random inputs.
>> >
>> > In summary, the Toeplitz performs better in case of non-random inputs,
>> > and performs similar to Jenkins in random inputs (which may not be the
>> > case in real world). So we still prefer to use Toeplitz hash.
>> >
>> You are basing your conclusions on one toy benchmark. I don't believe
>> that an realistically loaded web server is going to consistently give
>> you tuples that happen to somehow fit into a nice model so that the
>> bias benefits your load distribution.
>>
>> > To minimize the computational overhead, we may consider put the hash
>> > in a per-connection cache in TCP layer, so it only needs one time
>> > computation. But, even with the computation overhead at this moment,
>> > the throughput based on Toeplitz hash is better than Jenkins:
>> > Throughput (Gbps) comparison:
>> > #conn           Toeplitz        Jenkins
>> > 32              26.6            23.2
>> > 64              32.1            23.4
>> > 128             29.1            24.1
>> >
>> You don't need to do that. We already store a random hash value in the
>> connection context. If you want to make it non-random then just
>> replace that with a simple global counter. This will have the exact
>> same effect that you see in your tests without needing any expensive
>> computation.
>
> Could you point me to the data field of connection context where this
> hash value is stored? Is it computed only one time?
>
sk_txhash in struct sock. It is set to a random number on TCP or UDP
connect call, It can be reset to a different random value when
connection is seen to be have trouble (sk_rethink_txhash).

Also when you say "Toeplitz performs better in case of non-random
inputs" please quantify exactly how your input data is not random.
What header changes with each connection in your test...

> Thanks!
>
> - Haiyang
>
>
>