lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 19 Apr 2015 03:49:34 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] "Attack on the iterative compression function"

On Sat, Apr 18, 2015 at 11:42:06PM +0300, Solar Designer wrote:
> Dmitry's "Tradeoff for Yescrypt" short paper does acknowledge that for
> their factor of 6 memory reduction "in practice the computational
> overhead will be probably too large".  Per the first table, it'll be
> 2^19 for factor of 5, and it is not even listed for factor of 6.

I am playing with Dmitry's program now (thanks again, Dmitry!)

Turns out the computation penalty is over a billion for 1/6+5/96 =
~21.9% memory (this is actually higher than 1/5).

$ cat penalties-yes-multi.log
Average Penalty for fraction 2 (104 close values): 2.96  Latency: 2.55 Read: 2.96
Average Penalty for fraction 3 (133 close values): 26.69  Latency: 6.90 Read: 26.69
Average Penalty for fraction 4 (176 close values): 1262.15  Latency: 17.37 Read: 1262.15
Average Penalty for fraction 5 (233 close values): 416359.53  Latency: 40.80 Read: 416359.53
Average Penalty for fraction 6 (323 close values): 1620898688.00  Latency: 86.20 Read: 1620898688.00
Average Penalty for fraction 7 (499 close values): 84016875175936.00  Latency: 167.21 Read: 84016875175936.00
Average Penalty for fraction 8 (4 close values): 1966579893927936.00  Latency: 258.72 Read: 1966579893927936.00

The above is for passes = 4.0/3.0.  Dmitry originally set it to passes =
1.33, which produces significantly lower penalties for 1/6 and on:

Average Penalty for fraction 2 (101 close values): 2.92  Latency: 2.53 Read: 2.92
Average Penalty for fraction 3 (130 close values): 25.27  Latency: 6.79 Read: 25.27
Average Penalty for fraction 4 (174 close values): 1254.56  Latency: 17.49 Read: 1254.56
Average Penalty for fraction 5 (260 close values): 413326.00  Latency: 41.03 Read: 413326.00
Average Penalty for fraction 6 (315 close values): 1233897728.00  Latency: 84.64 Read: 1233897728.00
Average Penalty for fraction 7 (510 close values): 59750851018752.00  Latency: 164.19 Read: 59750851018752.00
Average Penalty for fraction 8 (6 close values): 1658737978769408.00  Latency: 262.83 Read: 1658737978769408.00

For those who haven't seen Dmitry's short paper, the latency figures
reported here are prior to the BlockMix-level TMTO attack, which reduces
those (and that's the main problem we're discussing).  The computation
and read penalties will be slightly higher with the BlockMix-level TMTO
attack added.

> I am more concerned about the attack at lower TMTO ratios, 1/2 to 1/4
> inclusive, where the required number of parallel cores isn't necessarily
> prohibitive per se.  The latency overhead on putting those cores to use
> (including extra latency coming from the increased memory bandwidth
> usage on needing to fetch, if I am correct, roughly twice more data at
> the 1/4 ratio) may be more of a problem, and may defeat the attack in
> some cases.

When I wrote "the increased memory bandwidth usage [...] roughly twice
more", I counted only the BlockMix-level TMTO's extra reads (one tree
branch's worth of sub-blocks in addition to one full block), and did not
count the SMix-level TMTO's extra reads, even though the two tradeoffs
only work well together.  So I was wrong.  The increase in required
memory bandwidth is actually much worse, perhaps over 1000 times for the
1/4+3/64 memory, as the program reports.  This is probably enough to
kill the attack in almost all practical cases.

BTW, I don't see where the computation penalty of 1135 for 1/4 in the
paper came from.  The program gave me 1254.56 even before I patched the
higher-precision value in place of 1.33.  Maybe there's precision loss
elsewhere in the program.  Maybe it should use double instead of float.
(A quick "#define float double" right after the #include's resulted in
compile errors, and I'm lazy.)  The 1135 vs. 1255 (or 1262) difference
does not really matter for this discussion, but it means there may be
worse precision loss for some other results.

... rebuilt with -O3 instead of -O2, got 1290.28 there.  Added
-ffast-math on top of that, got 1244.30.  This program is definitely
having precision loss issues.

Still it's very helpful to have it.

Alexander

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ