phc-discussions - Re: [PHC] Random Hash Functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5165E696.7050903@defuse.ca>
Date: Wed, 10 Apr 2013 16:24:22 -0600
From: havoc <havoc@...use.ca>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Random Hash Functions

On 04/09/2013 08:26 PM, Marsh Ray wrote:
>> -----Original Message-----
>> From: havoc [mailto:havoc@...use.ca]
>> Sent: Tuesday, April 9, 2013 8:20 PM
>> To: discussions@...sword-hashing.net
>> Subject: [PHC] Random Hash Functions
>>
>> By "random hash algorithm", I don't mean simply a random
>> hash function which you could easily get by prefixing a random salt, I mean
>> really generating random *code*.
> 
> What's the difference?
> 
> Not trying to be dismissive here. I'm wondering how one would go about describing how such a function is fundamentally different than a "static" hash function.
> 
>> Of course, generating a random *secure* cryptographic hash function is NOT
>> easy, since it takes years of design and testing to become confident in just
>> one.
> 
> But the function
> 
>      H = SHA-2-256(SHA-2-512("a") ++ msg1)
> 
> is a "randomly different" function than
> 
>      H' = SHA-2-256(SHA-2-512("b") ++ msg2)
> 
> right?
> 
> In other words, if knowledge of collisions or preimages of H help you to find collisions or preimages of H', then you have broken SHA-2-256 itself.
> 
> But maybe not in the sense that you mean.

I'll take a shot at formalizing what I mean.

Let S be the set of all possible "random functions". Let Imp(F) be the
implementation given when F is selected from S at random. We want:

1. Imp(F) is very close to the fastest way to implement F given the
available instructions.

    ...because if the crackers can find a faster implementation than
what we're actually using, they have an advantage.

2. For all distinct pairs F1, F2 in S (F1 != F2), the optimal
implementation of a hardware device that can compute F1 and F2 for any
inputs is not:

    i.  Smaller than the sum of the sizes of the optimal
        implementations of F1 and F2 (on their own).
    ii. Faster than the "sum" of the speeds of the optimal
        implementations of F1 and F2 (on their own).

    ...because if there is a lot of redundancy amongst the functions,
cheaper and faster hardware crackers can be produced by eliminating that
redundancy.

3. Extend #2 to triples, quadruples, ..., |S|-tuples from S.

4. The number of functions in S is large, i.e. more than 2^128.

Generating random functions like this...
    s = random 32-byte string
    F(input) = SHA256(s ++ input)
...would fail property #2, since the same hardware could be used to
compute all of the possible functions.

Selecting a random function from...
    S = {SHA256, WHIRLPOOL, BLAKE256, JH256, GROESTL256}
...probably satisfies #1, #2, and #3, but fails #4.

> 
> What are some things that CPUs can do that plain old hardwired data-flow logic can't do as easily? Loops and conditional jumps come to mind. Randomly-chosen memory accesses like bcrypt and scrypt. But even CPUs are implemented with the same logic building blocks that are available in FPGAs.
> 
>> However, except for very valuable hashes, attackers probably won't
>> spend a lot of time cryptanalyzing them, so it's OK for them to be not-so-
>> secure.
> 
> Famous last words :-)
> 
> IMHO the idea of a randomly-generated function looks very appealing. But in theory and practice you'd end up paying a substantial price in complexity for a something that was basically impossible to show was helpful at all (if not harmful).
> 

I certainly wouldn't recommend doing it unless we have good assurances
that it's not making it worse and that something like properties #1 to
#4 are satisfied.

It's easy to guarantee that it's not making things too much worse with
the following additional rules:

5. All functions are one-to-one. This guarantees that the functions
don't "lose entropy."

6. No more than half of the total time it takes to hash a password is
spent in the random function. So even if the crackers figure out a way
to completely bypass the random functions, they get no more than a 4x
speedup* (they'd be able to make ASICs now but they'd be no better off
than if the random functions didn't exist in the first place).

It is still possible that the random function could introduce a side
channel attack of some sort.

It must be extremely difficult, maybe even impossible, to satisfy all of
the above properties. But if it can be done, I think it would be
incredibly useful for a password hashing / stretching function.

- Taylor

Footnotes:

* Effectively 4x instead of 2x because the original hashers could have
doubled the iteration count if they knew the random functions were useless.