phc-discussions - Re: [PHC] "Attack on the iterative compression function"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150418204206.GA792@openwall.com>
Date: Sat, 18 Apr 2015 23:42:06 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] "Attack on the iterative compression function"

On Fri, Apr 17, 2015 at 10:05:47AM -0700, Bill Cox wrote:
> Here's the outrageous claim they make against Yescrypt:
> 
>   for 1/4 memory, Yescrypt has a 1/2 "Time-memory product"
> 
> In their previous table, they say that at a 1/4 memory attack, the attacker
> must do 1135 times more computation.  The time*memory defense as used by
> the whole world other than the Argon team is therefore 283.75.  This paper
> is off by a factor of 567.5!
> 
> I do not consider this a weakness of Yescrypt.  I wish the Argon team would
> start using proper terminology.

As I'm sure you realize, it's the difference between computation time
and best possible real time (assuming unlimited parallelism and no
latency overhead on putting that parallelism to use).  I think both
metrics make sense and are useful for us to consider.  I appreciate that
analyses by the Argon team list C(q) and D(q) separately.

Dmitry's "Tradeoff for Yescrypt" short paper does acknowledge that for
their factor of 6 memory reduction "in practice the computational
overhead will be probably too large".  Per the first table, it'll be
2^19 for factor of 5, and it is not even listed for factor of 6.  Thus,
I am more concerned about the attack at lower TMTO ratios, 1/2 to 1/4
inclusive, where the required number of parallel cores isn't necessarily
prohibitive per se.  The latency overhead on putting those cores to use
(including extra latency coming from the increased memory bandwidth
usage on needing to fetch, if I am correct, roughly twice more data at
the 1/4 ratio) may be more of a problem, and may defeat the attack in
some cases.

BTW, I think it's 1/4+3/64 memory (the paper mentions this detail, just
not in the table), but that's close: 25% vs. 29.7%.  Another detail is
how the average (rather than peak) memory usage is reduced with TMTO.
The recomputation depth grows the longer the algorithm runs, so the
average (across the running time) is skewed towards a higher portion of
the TMTO'd memory size.  For yescrypt at t=0 without TMTO, the memory
fills up at 3/4 of the running time, and the average usage is 5/8.
If the 1/4+3/64 fills up e.g. at 1/2 of the running time (and does so
linearly, even though this is certainly not the case), the average is
3/4 of that, which is higher than the non-TMTO 5/8.  So we'd get
(1/4+3/64)*3/4 / (5/8) = 35.6% average memory usage under TMTO, which
isn't as bad as the 25% where we started.  That's just an example.
We might want to figure out the actual number.

I do consider yescrypt's susceptibility to beneficial (at least under a
simplified theoretical model) tradeoff attacks on its BlockMix a
drawback of the current yescrypt design.  I am thinking of addressing
this through documentation, recommended parameter values (for r and t),
and possibly a tweak (if permitted in PHC, or beyond PHC if yescrypt
isn't selected as a winner).  For example, it can be said (in revised
documentation) that t=0 is recommended when only computation*memory
tradeoffs need to be defeated, and t=1 or higher when also
latency*memory tradeoffs need to be defeated under assumption that the
attacker has abundant parallel computing resources (but limited memory).

As to a possible tweak, I'd appreciate attacks on scrypt's shuffling and
on the reverse order sub-blocks (as I suggested in another message).
While it can be said that these are attacked by storing additional
intermediate sub-blocks, that general statement isn't actionable for me
to choose the more effective mitigation, nor to document exactly how
effective or not it is.

Another option is to adopt Argon2's compression function and introduce
tunable parallelism (optional data dependencies) and pwxform rounds into
it, although I doubt I'd reach bcrypt-like frequency of S-box lookups
there (as I'd need to keep the BLAKE2b rounds as well, for full
diffusion, or otherwise this would be pointless).  If so, I'd also
replace the uses of Salsa20 with BLAKE2b, but even then the overall
complexity of full implementation would increase (as Salsa20 would also
be needed when scrypt compatibility is compiled in).  So it's not an
option I like.  Something like this may be more appropriate for a
revision of Argon2 than for a revision of yescrypt.

Alexander