lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 27 Oct 2009 16:41:52 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Stephen Hemminger <shemminger@...tta.com>
cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Stephen Hemminger <stephen.hemminger@...tta.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Octavian Purdila <opurdila@...acom.com>,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH] dcache: better name hash function



On Tue, 27 Oct 2009, Stephen Hemminger wrote:
> 
> Going back to basics. Run tests across different input sets.
> Dropping off the slow ones like crc, md5, ...
> Not using jhash because it doesn't have good character at a time
> interface.
> 
> Also, the folding algorithm used matters. Since the kernel
> already uses hash_long() to fold back to N bits, all the
> tests were rerun with that.

Yeah, the 'hash_long()' folding matters for anything that doesn't multiply 
big some big number to spread the bits out, because otherwise the bits 
from the last character hashed will always be in the low bits.

That explains why our current hash looked so bad with your previous code.

>From your numbers, I think we can dismiss 'kr_hash()' as having horrible 
behavior with names like pppXXX (and that isn't just a special case: it's 
also noticeably worse for your /home directory case, which means that the 
bad behavior shows up in practice too, not just in some special cases).

'elf' and 'pjw' don't have quite the same bad case, but the stddev for the 
pppXXX cases are still clearly worse than the other alternatives. They 
also seem to always be slower than what we already have. 

The 'fnv32' algorithm gets fairly good behavior, but seems bad on Itanium. 
Looks like it depends on a fast multiplication unit, and all even your 
"slow" ULV chip seems to be a Core2 one, so all your x86 targets have 
that. And our current name hash still actually seems to do better in all 
cases (maybe I missed some case) even if fnv32 is slightly faster on x86.

>From your list 'string10' seems to get consistently good results and is at 
or near the top of performance too. It seems to be the one that 
consistently beats 'full_name_hash()' both in performance and in behavior 
(string_hash17/31 come close, but aren't as clearly better performing).

But I haven't actually seen the hashes. Maybe there's something that makes 
string10 bad?

Regardless, one thing your new numbers do say is that our current hash 
really isn't that bad.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ