phc-discussions - Re: [PHC] Computational comparison of 3 schemes...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 23 Jan 2014 07:03:18 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Computational comparison of 3 schemes...

On Thu, Jan 23, 2014 at 5:44 AM, Christian Forler
<christian.forler@...-weimar.de> wrote:
> On 23.01.2014 00:54, Bill Cox wrote:
>
>> There are very interesting trade-offs.  First, all three algorithms
>> fail completely in timing-attack-resistant mode for an attacker using
>> 1/4 of memory.  All three required no significant recalculations, and
>> an attacker gets to use 1/4 of memory basically for free.  Catena-3
>> only uses 1/4 of the memory, because it reuses the row memory, and
>> there are 4 rows.  Both attacker and defender benefit from the 1/4th
>> memory.
>
> This observation is a little bit unfair. We claim: with (at least) 2^g
> memory unites Catena-3 can be computed with (approx.) 4*2^g hash
> function calls. It is obvious that you can compute Catena-3 in the same
> time when you have 4 times the memory.

I was being pressed to come to dinner, and I felt I needed an e-mail
edit button moments after I pressed send.  Let me restate this
observation.

In timing resistant mode where we do zero reads from password-derived
addresses, there is a natural attack against NoelKDF and what I
understand escrypt to be doing.  An attacker gets a 4X memory
reduction almost for free, as he won't have to do any significant
recalculations.  This attack is so devastating that I think both
algorithms should be rewritten to use only 1/4 the memory, just like
an attacker would do.  This would complicate these algorithms quite a
bit, and result in 4X less memory usage.

In contrast, Catena-3 already is optimized for only using 1/4 of the
memory compared to the graph size, without complicating the algorithm.
 This observation lets me say with confidence that in timing-resistant
mode, Catena-3 is simply the best of the three, without having to add
any disclaimers about using less memory than the others.

Now your implementation still needs to be optimized.  I really do
believe you should have a version optimized to be
multiply-time-limited in the inner loop with a simpler faster hash
function than Blake2.  That would increase your memory useage 10X, so
you'd also have to use larger block sizes.

Please tell me your guys are at least trying this.  If not, would you
mind if I do a "friendly fork" of Catena, minimally modified to be 10X
faster, and compute-time hardened against ASIC attacks?  By friendly
fork, I mean a fork on github where I beg you to pull the changes, and
which goes away if you do.

>> In short, Catena-3 is the natural timing-resistant choice, but it only
>> uses 1/4 of the memory of an equivalent sequential-memory-hard KDF.
>
> Your strategy is
> 1. allocate a huge amount of memory and
> 2. overwrite the majority of it

Actually overwrite all of it exactly once.

> The Catena strategy is:
> 1. allocate a reasonable amount of memory
> 2. overwrite (all of) it multiple times.

Specifically, it's a total of 9 memory accesses in Catena-3, vs 2 in
scrypt, escript, and my simple hack.  I'm not sure why 9... seems like
it should be 8, but that's what I saw in the code.  Also, the hash is
called twice per node, because it's used once per edge rather than
once per node.  There's room for optimization here.

> The former strategy implies that the cost for time and memory are alike
> since we have S(g) * T(g) = G^2. Let assume that S(g) is not equal
> T(g). Then an adversary might be willing to trade some memory units
> for some additional CPU time. Thus, an adversary can  exploit linear
> optimization techniques to compute a sweet spot.

This is sometimes done in reality against script.  Both escript and my
hacked algorithm avoid the worst TMTO that allows for any choice of
compute time vs memory.  NoelKDF is susceptible to a 2X memory/time
trade-off, which is a problem, but even then an attacker pays more
than a linear recomputation penalty.  A 4X attack is impractical, IMO.
 In short, I think there's around a 2X memory usage improvement
obtainable by using unpredictable password-dependent addressing.  The
TMTO is defeated either way.

> You have multiple times elaborated that you do not like the latter
> approach. Fine by me. :-)

I have been vocal about not liking < 20MB, and I still don't like it.
As I've always said before, this is fixable, and not an inherent
problem in the Catena-3 algorithm.  You just need a code monkey.  I'm
a big fury monkey myself.

> But I do not really understand why the former strategy is superior to
> the latter one.

Now that I've played around more with attacks, I think it's really
simple.  There's about a 2X memory penalty in what seems to be
possible if we avoid cache timing attacks.  It's not clear if that's
worth it, though I'm leaning towards saying it is (yes, I'm leaning
towards your algorithm - just not the implementation I've tested).

>> 1 Put all the initial pebbles down over the first N locations if there
>> are N pebbles
>> 2 Add all non-fixed, non-in-use pebbles that have no edges pointing at
>> them from beyond the current runner pebble position to the "pebble
>> group"
>>     2.5 If there are no such pebbles, use a heuristic to add the
>> "best" pebble to the group.
>> 3 When pebbling, take virtual pebbles from the pebble group.  Reduce
>> the "num available" pebbles on the group by 1, but alloc a new pebble
>> rather than removing any from the group
>> 4 When num available reaches 0, delete all the pebbles in the group, and goto 2
>
> Thanks you very much for the evaluation of the three password schemes.
> Great job! Have you planed to publish the source code? I might be useful
> for the evaluation process of the submitted PHC candidates. :-)

The code needs a bit of work.  Instead of command line parameters, I
edit and recompile to change memory size and such.  I'd be happy to
make these changes and run it on all entries.  It's only an upper
bound, but I was honestly surprised to see it pebble NoelKDF in 1/4
memory.  Other authors might find the results useful.

I don't know if the idea came across above.  Basically, if there are
several "good" choices for a pebble to pick up, be lazy and decide
later.

Bill