phc-discussions - Re: [PHC] "Attack on the iterative compression function"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 18 Apr 2015 22:52:06 -0700
From: Bill Cox <waywardgeek@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Re: [PHC] "Attack on the iterative compression function"

On Sat, Apr 18, 2015 at 1:42 PM, Solar Designer <solar@...nwall.com> wrote:

> On Fri, Apr 17, 2015 at 10:05:47AM -0700, Bill Cox wrote:
> > Here's the outrageous claim they make against Yescrypt:
> >
> >   for 1/4 memory, Yescrypt has a 1/2 "Time-memory product"
> >
> > In their previous table, they say that at a 1/4 memory attack, the
> attacker
> > must do 1135 times more computation.  The time*memory defense as used by
> > the whole world other than the Argon team is therefore 283.75.  This
> paper
> > is off by a factor of 567.5!
> >
> > I do not consider this a weakness of Yescrypt.  I wish the Argon team
> would
> > start using proper terminology.
>
> As I'm sure you realize, it's the difference between computation time
> and best possible real time (assuming unlimited parallelism and no
> latency overhead on putting that parallelism to use).  I think both
> metrics make sense and are useful for us to consider.  I appreciate that
> analyses by the Argon team list C(q) and D(q) separately.
>

I agree.  The C(q) and D(q) tables are helpful.  It's their conclusion that
an attacker can spend 1/2 as much per guess as a defender based on a TMTO
attack that is simply wrong, and I believe purposely misleading.  We've
equated memory*time to dollars all along, and they redifined it entirely
differently, probably specifically to mislead readers.


> Dmitry's "Tradeoff for Yescrypt" short paper does acknowledge that for
> their factor of 6 memory reduction "in practice the computational
> overhead will be probably too large".  Per the first table, it'll be
> 2^19 for factor of 5, and it is not even listed for factor of 6.


How nice of them to agree that a number of parallel CPUs to large to even
calculate will not be free, zero power, and zero latency :-)  Now, could we
get them to consider the impact at the 1/4 level properly?


> Thus,
> I am more concerned about the attack at lower TMTO ratios, 1/2 to 1/4
> inclusive, where the required number of parallel cores isn't necessarily
> prohibitive per se.


If their C(q) number is right, and the attacker has to have over 1100X more
computation, then I would not worry so much about the 1/4 memory case.
It's more likely that an attacker could find some benefit in the 3/4 to 1/2
memory range, but probably not enough to bother.

The latency overhead on putting those cores to use
> (including extra latency coming from the increased memory bandwidth
> usage on needing to fetch, if I am correct, roughly twice more data at
> the 1/4 ratio) may be more of a problem, and may defeat the attack in
> some cases.
>

The routing delays will also help defeat this attack, though I already
consider it quite dead at the 1/4 TMTO level simply due to the huge power
increase.  1100 cores might literally melt the chip if run at full speed.


> BTW, I think it's 1/4+3/64 memory (the paper mentions this detail, just
> not in the table)


in a pattern that's familiar...


> , but that's close: 25% vs. 29.7%.  Another detail is
> how the average (rather than peak) memory usage is reduced with TMTO.
> The recomputation depth grows the longer the algorithm runs, so the
> average (across the running time) is skewed towards a higher portion of
> the TMTO'd memory size.  For yescrypt at t=0 without TMTO, the memory
> fills up at 3/4 of the running time, and the average usage is 5/8.
>

This is a reasonable balance.

If the 1/4+3/64 fills up e.g. at 1/2 of the running time (and does so
> linearly, even though this is certainly not the case), the average is
> 3/4 of that, which is higher than the non-TMTO 5/8.  So we'd get
> (1/4+3/64)*3/4 / (5/8) = 35.6% average memory usage under TMTO, which
> isn't as bad as the 25% where we started.  That's just an example.
> We might want to figure out the actual number.
>

I would not worry about attackers who are willing to pay over a 1000X
computation penalty to do a 1/4 memory attack.  That's just nuts.  I'd
focus more on the minimal TMTO attacks, but from their data, Yescrypt looks
pretty solid there, too.


> I do consider yescrypt's susceptibility to beneficial (at least under a
> simplified theoretical model) tradeoff attacks on its BlockMix a
> drawback of the current yescrypt design.


The theoretical model is very highly unrealistic.  It's not a case worth
worrying about, IMO.

I am thinking of addressing
> this through documentation, recommended parameter values (for r and t),
> and possibly a tweak (if permitted in PHC, or beyond PHC if yescrypt
> isn't selected as a winner).  For example, it can be said (in revised
> documentation) that t=0 is recommended when only computation*memory
> tradeoffs need to be defeated,


I don't understand.  Are you saying there is a beneficial
computation*memory attack against Yescrypt?  I thought the high C(q)
numbers they reported help disprove that.  Computation increases faster
than memory is reduced at all points in their table.  Since power is
roughly 1/2 of an attacker's budget, it is unlikely that there is any
trade-off that benefits an attacker, who mostly cares about cost per guess.

and t=1 or higher when also
> latency*memory tradeoffs need to be defeated under assumption that the
> attacker has abundant parallel computing resources (but limited memory).
>

I do not think this would be good advice.  In virtually all cases, uses are
best protected with minimum t_cost.  There may be an attacker out there who
is forced to do a 1/2 memory TMTO to fit into his RAM, but the increase in
computation will hurt him in terms of cost per guess.  The only use case I
see for t_cost > 0 is when the defender has a fixed amount of memory and
does not mind waiting a very long time for hashing to complete.  It's
talked about a lot on this list, but I've never seen this used in the wild.

Personally, I wish t_cost could be dropped from the API for memory-hard
hashing.  Users will not correctly guess that their protection is maximized
by minimizing t_cost.


> As to a possible tweak, I'd appreciate attacks on scrypt's shuffling and
> on the reverse order sub-blocks (as I suggested in another message).
> While it can be said that these are attacked by storing additional
> intermediate sub-blocks, that general statement isn't actionable for me
> to choose the more effective mitigation, nor to document exactly how
> effective or not it is.
>

Some more analysis would be good.  This is the kind of thing the Argon guys
seem pretty good at.


> Another option is to adopt Argon2's compression function and introduce
> tunable parallelism (optional data dependencies) and pwxform rounds into
> it, although I doubt I'd reach bcrypt-like frequency of S-box lookups
> there (as I'd need to keep the BLAKE2b rounds as well, for full
> diffusion, or otherwise this would be pointless).  If so, I'd also
> replace the uses of Salsa20 with BLAKE2b, but even then the overall
> complexity of full implementation would increase (as Salsa20 would also
> be needed when scrypt compatibility is compiled in).  So it's not an
> option I like.  Something like this may be more appropriate for a
> revision of Argon2 than for a revision of yescrypt.
>
> Alexander
>

I like Yescrypt as it stands.  Salso20 is fine for the mixing between
blocks.  It's probably overkill.  Scrypt compatibility is another thing
entirely... I'm not sure it's worth the additional complexity to retain it,
but it is a nice feature.

In a stripped down version, I'd like to drop Scrypt compatibility, fix
t_cost to 0, and use one thread with 1/3 as many transform rounds, which is
what I think the default should be when running on 1 thread.

Bill

Content of type "text/html" skipped