lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 20 Apr 2015 23:36:09 +0200
From: Dmitry Khovratovich <>
To: "" <>
Subject: Re: [PHC] "Attack on the iterative compression function"

Hi Alexander,

I did add extra 1/16 of memory to each tradeoff attack (line 613 of
the code), so you should not add it again.

The program outputs different C(q) because it runs multiple tests
where addresses are determined randomly. For each iteration the memory
saving and the penalty is different, and I just average them over 100

The parameter WIDTH_SIZE/3q determines the number of the most
expensive blocks. They are so expensive that we store any block that
refer to them, that's the idea of the ranking method.

I should add that C(q) is actually overestimated in the program.
First, it does not take into account that the total number of blocks
is limited (2^20 for 1 GB, for example). Secondly, it counts every
block as many times as it appears in different branches of the
recomputation tree. A more clever tradeoff method would not recompute
the same block too many times.

Regarding your comment on the average memory use, I note that the
program does not exploit all the memory immediately. For every block
it decides whether to store it or not, and does not delete the blocks.

On Sun, Apr 19, 2015 at 2:49 AM, Solar Designer <> wrote:
> On Sat, Apr 18, 2015 at 11:42:06PM +0300, Solar Designer wrote:
>> Dmitry's "Tradeoff for Yescrypt" short paper does acknowledge that for
>> their factor of 6 memory reduction "in practice the computational
>> overhead will be probably too large".  Per the first table, it'll be
>> 2^19 for factor of 5, and it is not even listed for factor of 6.
> I am playing with Dmitry's program now (thanks again, Dmitry!)
> Turns out the computation penalty is over a billion for 1/6+5/96 =
> ~21.9% memory (this is actually higher than 1/5).
> $ cat penalties-yes-multi.log
> Average Penalty for fraction 2 (104 close values): 2.96  Latency: 2.55 Read: 2.96
> Average Penalty for fraction 3 (133 close values): 26.69  Latency: 6.90 Read: 26.69
> Average Penalty for fraction 4 (176 close values): 1262.15  Latency: 17.37 Read: 1262.15
> Average Penalty for fraction 5 (233 close values): 416359.53  Latency: 40.80 Read: 416359.53
> Average Penalty for fraction 6 (323 close values): 1620898688.00  Latency: 86.20 Read: 1620898688.00
> Average Penalty for fraction 7 (499 close values): 84016875175936.00  Latency: 167.21 Read: 84016875175936.00
> Average Penalty for fraction 8 (4 close values): 1966579893927936.00  Latency: 258.72 Read: 1966579893927936.00
> The above is for passes = 4.0/3.0.  Dmitry originally set it to passes =
> 1.33, which produces significantly lower penalties for 1/6 and on:
> Average Penalty for fraction 2 (101 close values): 2.92  Latency: 2.53 Read: 2.92
> Average Penalty for fraction 3 (130 close values): 25.27  Latency: 6.79 Read: 25.27
> Average Penalty for fraction 4 (174 close values): 1254.56  Latency: 17.49 Read: 1254.56
> Average Penalty for fraction 5 (260 close values): 413326.00  Latency: 41.03 Read: 413326.00
> Average Penalty for fraction 6 (315 close values): 1233897728.00  Latency: 84.64 Read: 1233897728.00
> Average Penalty for fraction 7 (510 close values): 59750851018752.00  Latency: 164.19 Read: 59750851018752.00
> Average Penalty for fraction 8 (6 close values): 1658737978769408.00  Latency: 262.83 Read: 1658737978769408.00
> For those who haven't seen Dmitry's short paper, the latency figures
> reported here are prior to the BlockMix-level TMTO attack, which reduces
> those (and that's the main problem we're discussing).  The computation
> and read penalties will be slightly higher with the BlockMix-level TMTO
> attack added.
>> I am more concerned about the attack at lower TMTO ratios, 1/2 to 1/4
>> inclusive, where the required number of parallel cores isn't necessarily
>> prohibitive per se.  The latency overhead on putting those cores to use
>> (including extra latency coming from the increased memory bandwidth
>> usage on needing to fetch, if I am correct, roughly twice more data at
>> the 1/4 ratio) may be more of a problem, and may defeat the attack in
>> some cases.
> When I wrote "the increased memory bandwidth usage [...] roughly twice
> more", I counted only the BlockMix-level TMTO's extra reads (one tree
> branch's worth of sub-blocks in addition to one full block), and did not
> count the SMix-level TMTO's extra reads, even though the two tradeoffs
> only work well together.  So I was wrong.  The increase in required
> memory bandwidth is actually much worse, perhaps over 1000 times for the
> 1/4+3/64 memory, as the program reports.  This is probably enough to
> kill the attack in almost all practical cases.
> BTW, I don't see where the computation penalty of 1135 for 1/4 in the
> paper came from.  The program gave me 1254.56 even before I patched the
> higher-precision value in place of 1.33.  Maybe there's precision loss
> elsewhere in the program.  Maybe it should use double instead of float.
> (A quick "#define float double" right after the #include's resulted in
> compile errors, and I'm lazy.)  The 1135 vs. 1255 (or 1262) difference
> does not really matter for this discussion, but it means there may be
> worse precision loss for some other results.
> ... rebuilt with -O3 instead of -O2, got 1290.28 there.  Added
> -ffast-math on top of that, got 1244.30.  This program is definitely
> having precision loss issues.
> Still it's very helpful to have it.
> Alexander

Best regards,
Dmitry Khovratovich

Powered by blists - more mailing lists