lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFw=wwGD=QLmQgfGm+oZR3UyvKBmXdXj+SJtK6pipCe45A@mail.gmail.com>
Date:	Thu, 1 Mar 2012 17:01:52 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: .. anybody know of any filesystems that depend on the exact VFS
 'namehash' implementation?

On Thu, Mar 1, 2012 at 4:46 PM, Andi Kleen <andi@...stfloor.org> wrote:
>
> There should be generally better modern general hash algorithms around,
> like murmur, cityhash or snoopy. Perhaps even the fnv we have in tree,
> but it's somewhat dated by know.
>
> They all have larger code, but if it's really that hot it would be worth
> it.

The quality of our hash function really doesn't seem to be the issue.
In fact, my (other - I've done both cache-hot and cache-cold) profiles
seem to indicate that the only access that ever matters is the first
one (ie the read from the hash table, not the following of the
chains).

I have a dim memory of us playing with the hash function itself long
long ago, and what made a difference was using the right bits from the
"parent" pointer, not the hash of the name itself. That's especially
true since many filenames are really just in the 3-8 character length
thing, so creating a 32-bit hash from that is not going to have lots
of collisions.

We basically have "two hashes": we have the full 32-bit "hash number"
that we compute over the filename (and also compare at lookup time
before we do a name compare), and then we have the "bucket number"
which is the actual hash chain number.

The bucket number is generate from both the filename hash number and
the parent pointer, and then we try to mix bits around with that
GOLDEN_RATIO_PRIME thing. And last time this was an issue, it was the
"mixing bits around" that made a difference to proper bucket use. See
commit 99effef9544e: "[PATCH] dentry and inode cache hash algorithm
performance changes." in the historical BK import by Thomas.

                         Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ