linux-kernel - Re: .. anybody know of any filesystems that depend on the exact VFS 'namehash' implementation?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFwHN_4=WOOYVC60kdSfwYautigNwTwUnjQDRa-oTanVog@mail.gmail.com>
Date:	Thu, 1 Mar 2012 14:42:20 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Chris Mason <chris.mason@...cle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: .. anybody know of any filesystems that depend on the exact VFS
 'namehash' implementation?

Ok, guys, here's a *very* preliminary version of the patch I was thinking of.

NOTE! The changes are all totally unconditional, which means that this
patch will break on:

 - any big-endian machine (the "find first NUL/slash byte" and mask
generation really is little-endian only)

   Maybe somebody with a big-endian machine could try to figure out
how the logic would work there to efficiently find the mask of bytes
that are before the first NUL or slash.

 - any hardware that doesn't do good unaligned 'unsigned long' accesses

 - if you enable CONFIG_DEBUG_PAGEALLOC

In addition, I've only *tested* this on x86-64, and I'm pretty sure
that on x86-32 it will at the very least warn about the big 64-bit
constants I use (but I think it should work - they'll just get
truncated, and I tried to use "sizeof(unsigned long)" instead of "8"
etc).

Anyway, on x86-64 it works for me. And my stupid test-case (look up
the same pathname ten million times - I'm explicitly looking up the
*same* pathname to not show the D$ trashing of the hash tables looking
up millions of different names) went from 9.053s to 8.099s.

So that's a 10% improvement on an admittedly extreme load, but the two
functions that were improved (__d_lookup_rcu and link_path_walk)
really *are* the two hottest functions on actual real loads too, so
while my microbenchmark was extreme, it's at least testing something
relevant. And it's the whole microbenchmark that improved by 10%, so
those hot functions themselves each actually improved by about 30%.

Comments?

I agree that the games I play to do things one word at a time aren't
*pretty*, but I did try to split things up into helper functions so
that the code is actually conceptually understandable even if you
can't follow the tricks I play. Staring too much at that new
hash_name() function may turn you mad. Not only is the "find byte in
word" logic subtle, the whole crazy "do { } while ()" loop is written
that way so that gcc doesn't go crazy and try to unroll it once (which
gcc does if you turn it into the more obvious "for ()" loop).

That said, trying to figure out *why* hash_name() works can be a fun
exercise if you're into these kinds of tricks. It was certainly fun to
write.

                     Linus

View attachment "dentry-speed.patch" of type "text/x-patch" (8318 bytes)