[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN1PR0301MB0770157002F6294CFF594C13CACC0@BN1PR0301MB0770.namprd03.prod.outlook.com>
Date: Thu, 14 Jan 2016 19:15:27 +0000
From: Haiyang Zhang <haiyangz@...rosoft.com>
To: Tom Herbert <tom@...bertland.com>
CC: Eric Dumazet <eric.dumazet@...il.com>,
One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
David Miller <davem@...emloft.net>,
"vkuznets@...hat.com" <vkuznets@...hat.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
KY Srinivasan <kys@...rosoft.com>,
"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH net-next] hv_netvsc: don't make assumptions on struct
flow_keys layout
> -----Original Message-----
> From: Tom Herbert [mailto:tom@...bertland.com]
> Sent: Thursday, January 14, 2016 1:49 PM
> To: Haiyang Zhang <haiyangz@...rosoft.com>
> Cc: Eric Dumazet <eric.dumazet@...il.com>; One Thousand Gnomes
> <gnomes@...rguk.ukuu.org.uk>; David Miller <davem@...emloft.net>;
> vkuznets@...hat.com; netdev@...r.kernel.org; KY Srinivasan
> <kys@...rosoft.com>; devel@...uxdriverproject.org; linux-
> kernel@...r.kernel.org
> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
> struct flow_keys layout
>
> On Thu, Jan 14, 2016 at 10:35 AM, Haiyang Zhang <haiyangz@...rosoft.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Eric Dumazet [mailto:eric.dumazet@...il.com]
> >> Sent: Thursday, January 14, 2016 1:24 PM
> >> To: One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>
> >> Cc: Tom Herbert <tom@...bertland.com>; Haiyang Zhang
> >> <haiyangz@...rosoft.com>; David Miller <davem@...emloft.net>;
> >> vkuznets@...hat.com; netdev@...r.kernel.org; KY Srinivasan
> >> <kys@...rosoft.com>; devel@...uxdriverproject.org; linux-
> >> kernel@...r.kernel.org
> >> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
> >> struct flow_keys layout
> >>
> >> On Thu, 2016-01-14 at 17:53 +0000, One Thousand Gnomes wrote:
> >> > > These results for Toeplitz are not plausible. Given random input
> you
> >> > > cannot expect any hash function to produce such uniform results.
> I
> >> > > suspect either your input data is biased or how your applying the
> >> hash
> >> > > is.
> >> > >
> >> > > When I run 64 random IPv4 3-tuples through Toeplitz and Jenkins I
> >> get
> >> > > something more reasonable:
> >> >
> >> > IPv4 address patterns are not random. Nothing like it. A long long
> >> time
> >> > ago we did do a bunch of tuning for network hashes using big porn
> site
> >> > data sets. Random it was not.
> >> >
> >>
> >> I ran my tests with non random IPV4 addresses, as I had 2 hosts,
> >> one server, one client. (typical benchmark stuff)
> >>
> >> The only 'random' part was the ports, so maybe ~20 bits of entropy,
> >> considering how we allocate ports during connect() to a given
> >> destination to avoid port reuse.
> >>
> >> > It's probably hard to repeat that exercise now with geo specific
> >> routing,
> >> > and all the front end caches and redirectors on big sites but I'd
> >> > strongly suggest random input is not a good test, and also that you
> >> need
> >> > to worry more about hash attacks than perfect distributions.
> >>
> >> Anyway, the exercise is not to find a hash that exactly splits 128
> flows
> >> into 16 buckets, according to the number of flows per bucket.
> >>
> >> Maybe only 4 flows are sending at 3Gbits, and others are sending at
> 100
> >> kbits. There is no way the driver can predict the future.
> >>
> >> This is why we prefer to select a queue given the cpu sending the
> >> packet. This permits a natural shift based on actual load, and is the
> >> default on linux (see XPS in Documentation/networking/scaling.txt)
> >>
> >> Only this driver has a selection based on a flow 'hash'.
> >
> > Also, the port number selection may not be random either. For example,
> > the well-known network throughput test tool, iperf, use port numbers
> with
> > equal increment among them. We tested these non-random cases, and
> found
> > the Toeplitz hash has distributed evenly, but Jenkins hash has non-
> even
> > distribution.
> >
> > I'm aware of the test from Tom Herbert <tom@...bertland.com>, which
> > showing similar results of Toeplitz v.s. Jenkins with random inputs.
> >
> > In summary, the Toeplitz performs better in case of non-random inputs,
> > and performs similar to Jenkins in random inputs (which may not be the
> > case in real world). So we still prefer to use Toeplitz hash.
> >
> You are basing your conclusions on one toy benchmark. I don't believe
> that an realistically loaded web server is going to consistently give
> you tuples that happen to somehow fit into a nice model so that the
> bias benefits your load distribution.
>
> > To minimize the computational overhead, we may consider put the hash
> > in a per-connection cache in TCP layer, so it only needs one time
> > computation. But, even with the computation overhead at this moment,
> > the throughput based on Toeplitz hash is better than Jenkins:
> > Throughput (Gbps) comparison:
> > #conn Toeplitz Jenkins
> > 32 26.6 23.2
> > 64 32.1 23.4
> > 128 29.1 24.1
> >
> You don't need to do that. We already store a random hash value in the
> connection context. If you want to make it non-random then just
> replace that with a simple global counter. This will have the exact
> same effect that you see in your tests without needing any expensive
> computation.
Could you point me to the data field of connection context where this
hash value is stored? Is it computed only one time?
Thanks!
- Haiyang
Powered by blists - more mailing lists