netdev - Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 17 Mar 2015 12:40:12 +0000
From:	"'tgraf@...g.ch'" <tgraf@...g.ch>
To:	Herbert Xu <herbert@...dor.apana.org.au>
Cc:	David Laight <David.Laight@...LAB.COM>,
	David Miller <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [v1 PATCH 1/14] rhashtable: Remove shift from bucket_table

On 03/17/15 at 11:20pm, Herbert Xu wrote:
> On Tue, Mar 17, 2015 at 12:13:42PM +0000, David Laight wrote:
> > From: Thomas Graf 
> > > Sent: 17 March 2015 11:58
> > ...
> > > > Do you really want to double the table size when 0.1% of the buckets
> > > > have a chain length > 4 but still < 16?
> > > 
> > > If we constantly hit that bucket because we are handling just a few
> > > TCP flows it would be worth to double the size & rehash to avoid the
> > > additional cache misses of the linked list.
> > > 
> > > Although a limit of 4 would be too high. Ideally we should resize and
> > > rehash when we add the 2nd entry to a bucket to stay < 100% utilization.
> > > It seems likely though that we'll always have a bucket with >=2
> > > entries so we would end up constantly doubling and rehashing. This is
> > > the only thing that speaks for a table wide nelems counters in my
> > > opinion.
> > 
> > I think you are seriously overestimating the 'efficiency' of the hash function.
> > And not doing the 'birthday paradox' maths at all.
> 
> Agreed.  Thomas, an easy test to do is to pump the integers from
> 0 to 65535 into jhash, then mask it with 65535 and run sort |
> uniq -c | sort -n on it, I think you'll see what we're talking
> about.

I'm not claiming perfect hash functions and this is exactly why I
think average utilization is not an optimal growth criteria because
it gives very limited view into the actual chain lengths.

What you describe above is a 100% utilization scenario. Initially
we talked about 0.1% utilization and whether to resize & rehash if a
single chain has length > 4. My answer is: yes we should resize &
rehash or at least rehash in that case.

My point here is that a chain length of 4 may be a serious
performance bottleneck already and that it might be worth to try
and detect bad hashing distribution and attempt to fix it at an
earlier stage while ruling out the possibility of endless rehashes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html