lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 17 Mar 2015 12:40:12 +0000
From:	"'tgraf@...g.ch'" <tgraf@...g.ch>
To:	Herbert Xu <herbert@...dor.apana.org.au>
Cc:	David Laight <David.Laight@...LAB.COM>,
	David Miller <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table

On 03/17/15 at 11:20pm, Herbert Xu wrote:
> On Tue, Mar 17, 2015 at 12:13:42PM +0000, David Laight wrote:
> > From: Thomas Graf 
> > > Sent: 17 March 2015 11:58
> > ...
> > > > Do you really want to double the table size when 0.1% of the buckets
> > > > have a chain length > 4 but still < 16?
> > > 
> > > If we constantly hit that bucket because we are handling just a few
> > > TCP flows it would be worth to double the size & rehash to avoid the
> > > additional cache misses of the linked list.
> > > 
> > > Although a limit of 4 would be too high. Ideally we should resize and
> > > rehash when we add the 2nd entry to a bucket to stay < 100% utilization.
> > > It seems likely though that we'll always have a bucket with >=2
> > > entries so we would end up constantly doubling and rehashing. This is
> > > the only thing that speaks for a table wide nelems counters in my
> > > opinion.
> > 
> > I think you are seriously overestimating the 'efficiency' of the hash function.
> > And not doing the 'birthday paradox' maths at all.
> 
> Agreed.  Thomas, an easy test to do is to pump the integers from
> 0 to 65535 into jhash, then mask it with 65535 and run sort |
> uniq -c | sort -n on it, I think you'll see what we're talking
> about.

I'm not claiming perfect hash functions and this is exactly why I
think average utilization is not an optimal growth criteria because
it gives very limited view into the actual chain lengths.

What you describe above is a 100% utilization scenario. Initially
we talked about 0.1% utilization and whether to resize & rehash if a
single chain has length > 4. My answer is: yes we should resize &
rehash or at least rehash in that case.

My point here is that a chain length of 4 may be a serious
performance bottleneck already and that it might be worth to try
and detect bad hashing distribution and attempt to fix it at an
earlier stage while ruling out the possibility of endless rehashes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ