linux-kernel - Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160530160618.22150.qmail@ns.sciencehorizons.net>
Date:	30 May 2016 12:06:18 -0400
From:	"George Spelvin" <linux@...encehorizons.net>
To:	linux@...encehorizons.net, peterz@...radead.org
Cc:	bfields@...hat.com, linux-kernel@...r.kernel.org,
	torvalds@...ux-foundation.org
Subject: Re: [PATCH v3 06/10] fs/namei.c: Improve dcache hash function

Peter Zijlstra <peterz@...radead.org> wrote:
> On Sat, May 28, 2016 at 03:57:19PM -0400, George Spelvin wrote:
>> +static inline unsigned int fold_hash(unsigned long x, unsigned long y)
>>  {
>> +	y ^= x * GOLDEN_RATIO_64;
>> +	y *= GOLDEN_RATIO_64;
>> +	return y >> 32;
>>  }

> So does it make sense to use that pattern here too?
>
> This code doesn't much care about performance, but wants a decent hash
> from the stack of class keys.
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 81f1a7107c0e..c8498efcd5d9 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -309,10 +309,12 @@ static struct hlist_head chainhash_table[CHAINHASH_SIZE];
>   * It's a 64-bit hash, because it's important for the keys to be
>   * unique.
>   */
> -#define iterate_chain_key(key1, key2) \
> -	(((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \
> -	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
> -	(key2))
> +static inline u64 iterate_chain_key(u64 x, u64 y)
> +{
> +	y ^= x * GOLDEN_RATIO_64;
> +	y *= GOLDEN_RATIO_64;
> +	return y;
> +}
>  
>  void lockdep_off(void)
>  {

Not quite.  The fold_hash() you quote is used only on 64-bit systems,
which can be assumed to have a reasonable 64-bit multiply.  On 32-bit
platforms, I avoid using GOLDEN_RATIO_64 at all, since 64x64-bit
multiplies are so expensive.

You actually have only 96 bits of input.  The correct prototype is:

static inline u64 iterate_chain_key(u64 key, u32 idx)

If performance mattered, I'd be inclined to use one or two iterations
of the 32-bit HASH_MIX() function, which is specifically designed
to add 32 bits to a 64-bit hash value.

A more thorough mixing would be achieved by __jhash_mix().  Basically:

static inline u64 iterate_chain_key(u64 key, u32 idx)
{
	u32 k0 = key, k1 = key >> 32;

	__jhash_mix(idx, k0, k1)	/* Macro that modifies arguments! */

	return k0 | (u64)k1 << 32;
}

(The order of arguments is chosen to perserve the two "most-hashed" values.)

Also, I just had contact from the hppa folks who have brought to my
attention that it's an example of an out-of-order superscalar CPU that
*doesn't* have a good integer multiplier.  For general multiplies,
you have to move values to the FPU and the code is a pain.

Instead, it has shift-and-add instructions designed to help the compiler
generate multiplies by constants, but large ones like GOLDEN_RATIO_64
is still a pain.

Here's code to take x (in %r26) and multiply it by GOLDEN_RATIO_64,
with the result in %r28:

        shladd,l %r26,1,%r26,%r19
        depd,z %r19,46,47,%r31
        sub %r31,%r19,%r31
        shladd,l %r31,2,%r26,%r31
        shladd,l %r31,2,%r26,%r31
        shladd,l %r31,2,%r31,%r19
        depd,z %r19,60,61,%r31
        sub %r31,%r19,%r31
        depd,z %r31,54,55,%r28
        add,l %r31,%r28,%r31
        shladd,l %r31,3,%r26,%r31
        depd,z %r31,59,60,%r28
        add,l %r31,%r28,%r31
        shladd,l %r31,2,%r26,%r31
        depd,z %r31,60,61,%r28
        sub %r28,%r31,%r28
        depd,z %r28,60,61,%r31
        sub %r31,%r28,%r31
        depd,z %r31,57,58,%r28
        add,l %r31,%r28,%r28
        depd,z %r28,61,62,%r28
        sub %r28,%r26,%r28
        shladd,l %r28,3,%r28,%r28

We're going to work on that.  Either finding a better code sequence,
or using the 32-bit version of hash_64() witht he nice code I've already
found for multiplies by GOLDEN_RATIO_32.

(But thaks, Peter, for looking a the code!)