netdev - Re: [PATCH 0/1] Rosebush, a new hash table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date: Sun, 25 Feb 2024 00:51:06 -0500
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Matthew Wilcox <willy@...radead.org>
Cc: David Laight <David.Laight@...lab.com>, 
	'Herbert Xu' <herbert@...dor.apana.org.au>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	Thomas Graf <tgraf@...g.ch>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, 
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>, 
	"maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>, "rcu@...r.kernel.org" <rcu@...r.kernel.org>
Subject: Re: [PATCH 0/1] Rosebush, a new hash table

On Sun, Feb 25, 2024 at 05:01:19AM +0000, Matthew Wilcox wrote:
> On Sat, Feb 24, 2024 at 10:18:31PM -0500, Kent Overstreet wrote:
> > On Sat, Feb 24, 2024 at 10:10:27PM +0000, David Laight wrote:
> > > I remember playing around with the elf symbol table for a browser
> > > and all its shared libraries.
> > > While the hash function is pretty trivial, it really didn't matter
> > > whether you divided 2^n, 2^n-1 or 'the prime below 2^n' some hash
> > > chains were always long.
> > 
> > that's a pretty bad hash, even golden ratio hash would be better, but
> > still bad; you really should be using at least jhash.
> 
> There's a "fun" effect; essentially the "biased observer" effect which
> leads students to erroneously conclude that the majority of classes are
> oversubscribed.  As somebody observed in this thread, for some usecases
> you only look up hashes which actually exist.
> 
> Task a trivial example where you have four entries unevenly distributed
> between two buckets, three in one bucket and one in the other.  Now 3/4
> of your lookups hit in one bucket and 1/4 in the other bucket.
> Obviously it's not as pronounced if you have 1000 buckets with 1000
> entries randomly distributed between the buckets.  But that distribution
> is not nearly as even as you might expect:
> 
> $ ./distrib
> 0: 362
> 1: 371
> 2: 193
> 3: 57
> 4: 13
> 5: 4
> 
> That's using lrand48() to decide which bucket to use, so not even a
> "quality of hash" problem, just a "your mathematical intuition may not
> be right here".

well, golden ratio hash - hash_32(i, 32)
0: 368
1: 264
2: 368
3: 0

but your distribution actually is accurate in general, golden ratio hash
is relly nice for sequential integers. the actual problem with your test
is that you're testing 100% occupancy - no one does that.

75% occupancy, siphash:
0: 933
1: 60
2: 6
3: 1
4: 0

that looks about right to me.