phc-discussions - Re: [PHC] Pufferfish

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 11 Apr 2015 14:53:28 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Pufferfish

Hi Jeremi,

Thank you for the prompt response.

On Sat, Apr 11, 2015 at 03:31:09AM -0700, epixoip@...dshell.nl wrote:
> Both the salt and the password are hashed with HMAC-SHA512 before doing anything blowfish-esque. Both the password and the salt are used in constructing both the s-boxes and the key. The salt is hashed, then the password is hashed with the salt to initialize the state, then the state is iterated depending on m_cost, and then the password is hashed again with the resulting state to generate the key.
> 
> salt_hash = hmac_sha512 (P[0], salt)
> state = hmac_sha512 (salt_hash, password)
> iterate over the state with hmac_sha512 4 * (2^(m_cost + 5) / 64) times
> key_hash = hmac_sha512 (state, password)
> expandkey (salt_hash, key_hash)   // this is where the blowfishy stuff starts
> 
> So initializing the key with an m_cost of 8 results in hashing the password with 514 rounds of hmac_sha512, which is /roughly/ equivalent to the security provided by sha512crypt with 1028 rounds. Not exactly an apples-to-apples comparison here, but that's the closest comparison I can think of. 

This may actually be excessive use of SHA-512 to my taste, reducing the
effectiveness of the bcrypt-like defense a little bit.

But this doesn't appear to achieve what you claim: no dependency on
cryptographic properties of your modified Blowfish for Pufferfish's own
cryptographic properties.

Suppose a "catastrophic failure" of your modified Blowfish is such that
it produces all-zero outputs all the time.  It does not, but the words
"catastrophic failure" should cover even, well, catastrophic failures
like this.  Would in this case Pufferfish still be cryptographically
secure?  I think not.

> > Another deviation from that claim is in how Pufferfish produces longer
> > than 64 bytes KDF output (when requested to do so).  You rely on your
> > modified Blowfish actually working reasonably for that.  Post-processing
> > of the output sub-blocks with HMAC-SHA512 wouldn't help if your modified
> > Blowfish e.g. leaves ctext unchanged between iterations of the
> > "for ( i = 0; i < blockcnt; i++ )" loop (e.g., because the S-boxes are
> > all-zero by that time).
> 
> I'm not sure I follow. I'll have to re-read this portion in the morning, but perhaps in the meantime you can elaborate on this. For example, I'm not sure why ctext would be unchanged between iterations, nor why the S-boxes would be all zeros. I agree this would be very bad, but I'm not sure how this scenario arises. 

It almost certainly would not arise in practice.  However, you're making
a bold claim in the specification document, and what I found in the
pseudocode and the reference code is inconsistent with that claim.

(I am making a very similar claim in yescrypt's, but as far as I'm aware
yescrypt actually achieves this.)

> > Of course, I don't expect failures like that to actually occur, since
> > your modified Blowfish looks reasonable, but my point is that Pufferfish
> > does rely on it for its cryptographic properties.  I'm afraid you'll
> > have to remove that claim from the spec, unless/until further tweaks are
> > possibly allowed (which, as you're aware, is currently not planned).
> 
> This statement can be revised in the spec.

Yes, and I think you should.  Ideally, you'd tweak Pufferfish such that
the statement would be true, but we're outside of the tweaks period.

> > What's Pufferfish's maximum supported m_cost?  This suggests it's 11:
> > [...]
> > and I think it corresponds to only 2 MiB.  Is this right?
> > [...]
> > Somehow Milan benchmarked it to much larger sizes, likely invoking
> > undefined behavior.
> > [...]
> > (or do you intentionally limit to 2 MB?)
> 
> ... Wow, you're right. I certainly never intended to limit it to 11, but it is indeed limited to 11! I actually thought that m_cost = 12 would result in x >> -1 which I thought was equal to x << 1. While this does seem consistent on the systems I've tested on, you're right, it is indeed undefined:
> 
> "If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined."

Yes.  How were you testing?  My guess is that the "x >> (unsigned)-1"
being equal to "x << 1" behavior may be a result of evaluation at
compile-time, perhaps of some trivial test case.  Here's what I am
seeing on i386 and x86_64 with gcc 4.6.x:

$ cat s.c
#include <stdint.h>
#include <stdio.h>

static uint64_t shift(uint64_t x, unsigned int s)
{
        return x >> (16 - s);
}

int main(void)
{
        printf("%llx\n", (unsigned long long)shift(0x1234567890abcdefULL, 17));
        return 0;
}
$ gcc s.c -o s -O2 -s -Wall
$ ./s
2468acf121579bde
$ gcc s.c -o s -O0 -s -Wall
$ ./s
0

In the -O2 case, the shift gets evaluated at compile time.  When I
disable optimization, it gets evaluated at run time, and the result
turns out to be 0.  Then I change 0x1234567890abcdefULL to
0x9234567890abcdefULL, and the result is 1, as I would expect.  So the
shift gets interpreted as ">> 63" here, leaving only one data bit.
This is pretty close to a "catastrophic failure", I'd say.  It should
also result in "wrong" speed numbers in benchmarks (actual ones would be
worse once you fix the bug, since the bug happens to improve the
locality of reference).

FWIW, -O1 results in different behavior on i386 (works as ">> 63") vs.
x86_64 (works as "<< 1", obviously at compile time) for me.  But this
really could be anything, since it's undefined.  It could also be e.g. 0
all the time, if a given CPU doesn't treat the shift amount (mod 64),
and simply shifts everything out (apparently, PPC commonly or always(?)
does that), or if the compiler omits the code as an optimization since
it's UB anyway (hopefully, it doesn't know sbox_words at compile-time,
so won't do that, although theoretically it could specialize the code
for the 12 valid cases only).  That's even more of a "catastrophic
failure", I'd say.

> While a 2MiB limit isn't necessarily tragic -- that is actually quite a bit of memory for Pufferfish -- it's certainly not what I had intended, and I definitely don't like it. I wish it would have been caught sooner :(

Yeah.  I think it would be, if you included test vectors for such sizes,
and a "make check".  Oh, wait, maybe the failures on big-endian that
people detected actually included this issue.  This suggests that you
should have made your code working on big-endian sooner, so that other
issues like this would be spotted as well.

BTW, in the v0 submission you had a JtR patch (with test vectors in it)
and an optimized implementation.  IIRC, you didn't have time to update
them for v1 by the tweaks deadline, so you dropped them - perhaps it's
time to get them back in, in a v2 submission (no tweaks, just adding
these additional implementations back, with the same tweaks you had put
into v1).

> > Maybe use:
> > 
> > static uint64_t pf_f ( puf_ctx *ctx, uint64_t x )
> > {
> >     return (( ctx->S[0][( x >> ( 64 - ctx->log2_sbox_words ) )] ^
> >               ctx->S[1][( x >> 35 ) & ( ctx->sbox_words - 1 )]) +
> >               ctx->S[2][( x >> 19 ) & ( ctx->sbox_words - 1 )]) ^
> >               ctx->S[3][( x >> 3 ) & ( ctx->sbox_words - 1 )];
> > }
> 
> I like this, but it might be too late to change. I guess it depends on if it is viewed as a bug fix, or as a tweak. I personally view it as a bug fix, but others may disagree. If it is too late to change, it could likely still be tweaked after the competition if it is selected as a winner.

I think PHC finalists can definitely be tweaked if they are _not_
selected as winners.  If a finalist is selected as a winner, then it's
up to the panel to decide whether/which tweaks to permit (and still call
it a winner).

FWIW, I already want to make one really minor tweak to yescrypt, like a
one-liner, having to do with hash upgrades for t > 0 hashes.  This isn't
critical at all (everything works as it is supposed to), but it would be
nice (would work better), if there's another tweaks phase in PHC... or
if yescrypt is not selected as a winner.  For now, I refrain from making
changes.

Alexander