netdev - Re: [PATCH] dcache: better name hash function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20091027171035.39e18383@nehalam>
Date:	Tue, 27 Oct 2009 17:10:35 -0700
From:	Stephen Hemminger <shemminger@...tta.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Stephen Hemminger <stephen.hemminger@...tta.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Octavian Purdila <opurdila@...acom.com>,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH] dcache: better name hash function

On Tue, 27 Oct 2009 16:41:52 -0700 (PDT)
Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> 
> 
> On Tue, 27 Oct 2009, Stephen Hemminger wrote:
> > 
> > Going back to basics. Run tests across different input sets.
> > Dropping off the slow ones like crc, md5, ...
> > Not using jhash because it doesn't have good character at a time
> > interface.
> > 
> > Also, the folding algorithm used matters. Since the kernel
> > already uses hash_long() to fold back to N bits, all the
> > tests were rerun with that.
> 
> Yeah, the 'hash_long()' folding matters for anything that doesn't multiply 
> big some big number to spread the bits out, because otherwise the bits 
> from the last character hashed will always be in the low bits.
> 
> That explains why our current hash looked so bad with your previous code.
> 
> From your numbers, I think we can dismiss 'kr_hash()' as having horrible 
> behavior with names like pppXXX (and that isn't just a special case: it's 
> also noticeably worse for your /home directory case, which means that the 
> bad behavior shows up in practice too, not just in some special cases).
> 
> 'elf' and 'pjw' don't have quite the same bad case, but the stddev for the 
> pppXXX cases are still clearly worse than the other alternatives. They 
> also seem to always be slower than what we already have. 
> 
> The 'fnv32' algorithm gets fairly good behavior, but seems bad on Itanium. 
> Looks like it depends on a fast multiplication unit, and all even your 
> "slow" ULV chip seems to be a Core2 one, so all your x86 targets have 
> that. And our current name hash still actually seems to do better in all 
> cases (maybe I missed some case) even if fnv32 is slightly faster on x86.
> 
> From your list 'string10' seems to get consistently good results and is at 
> or near the top of performance too. It seems to be the one that 
> consistently beats 'full_name_hash()' both in performance and in behavior 
> (string_hash17/31 come close, but aren't as clearly better performing).
> 
> But I haven't actually seen the hashes. Maybe there's something that makes 
> string10 bad?
> 
> Regardless, one thing your new numbers do say is that our current hash 
> really isn't that bad.
> 
> 		Linus

Agreed. Here is the reduced version of the program.
To run:
  find /home -printf '%f\n' 2>/dev/null | ./htest -n 100

View attachment "htest.c" of type "text/x-c++src" (5220 bytes)