phc-discussions - Re: [PHC] Tradeoff cryptanalysis of Catena, Lyra2, and generic memory-hard functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150330123139.GA27522@openwall.com>
Date: Mon, 30 Mar 2015 15:31:40 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Tradeoff cryptanalysis of Catena, Lyra2, and generic memory-hard functions

On Fri, Feb 13, 2015 at 04:45:35PM +0100, Dmitry Khovratovich wrote:
> The report is permanently available at
> http://orbilu.uni.lu/handle/10993/20043  and will be soon added to ePrint
> as well.

Sorry for stating the (maybe) obvious, but here's an additional comment,
which I don't recall seeing on this mailing list yet:

Indices of or some other kind of references to vertices cost memory too,
a few bytes each.  They will be needed to actually mount the attacks,
and their number (and thus total size) may be significant.

For example, Table 6 seems to imply that a massively-parallel machine
with essentially free computing cores would compute the hash with a time
penalty equal to the depth penalty, which is only 26.6 for 1/10 memory.
(While this feels like a bad tradeoff, it could make sense for some
attackers who e.g. only have 1/10 memory in a chip.)  However, with a
small enough block size, storing that tree's structure as needed for
such optimal computation may take a comparable amount of memory to that
for the actual data being stored.

Thus, decreasing block size is a way to make TMTO less efficient.  In an
extreme case, the block size may be comparable to one index size (e.g.,
4 bytes), and this would defeat even mild TMTO attacks (except on
TMTO-friendly schemes like the original scrypt, where it is not
necessary to record any metadata).  However, to significantly discourage
TMTO attacks needing a depth like the 26.6 figure above, block size of a
few hundred bytes would likely work as well (and this is a reasonable
setting for some PHC schemes).

BTW, the extra/sim-tmto.c simple TMTO attack implementation included in
the yescrypt submission accounts for the metadata required to mount the
attack, and reports it like this:

scrypt ircmaxell:
B' = 24efce90
t_cost = 512
m_cost = 256
scrypt ircmaxell TMTO1 = 5, TMTO2 = 2:
B' = 24efce90
t_cost = 978
m_cost = 180 elements + 256 indices (363 alloc + 256 ptrs) + 512 counters

This shows that while the number of "elements" decreased from 256
without TMTO to 180 with TMTO, we also added "256 indices (363 alloc +
256 ptrs) + 512 counters" for this algorithm's metadata.  At small
element size, this would be non-negligible, up to the point where the
TMTO doesn't reduce total memory usage at all despite of reducing the
number of elements significantly.

Here's what I wrote in a private e-mail reply a year ago:

---
Someone wrote:
> [...] I seem to recall from
> my theory of computation class that TMTO is a property of anything
> that can run on a Turing machine, so it seems hopeless to defeat it
> completely,

Sort of.  You can in fact always recompute instead of storing and
retrieving intermediate results, but the theory that the memory needs
can always be reduced, up to the point of needing no memory for any
intermediate results, fails when you realize that referring to
intermediate results that you don't store has non-zero memory cost -
e.g., it can be the size of a set of array indices.

Suppose at some point in computation your algorithm finds out it needs a
past intermediate result it didn't store.  It can initiate recomputation
of that intermediate result, but for that it needs to store some
identifier of that intermediate result somewhere (e.g. in a variable or
in a recursive call's stack frame or on a Turing machine's tape), so
that the re-invoked algorithm can store this one intermediate result
this time.  OK, it's only one such identifier, not a big deal - but what
happens if the same issue had already occurred (and thus will occur
again) during the recomputation of that one intermediate result?  You
now have two identifiers to store, and so on.

Given this theory, it's _not_ hopeless to defeat TMTO completely.  This
may be done by keeping the size of intermediate result identifiers on
par with or larger than the size of those intermediate results
themselves - in other words, making them tiny but numerous - then coming
up with an algorithm that computes the function in the least amount of
memory possible.

However, this might be impractical, for other reasons.

> but I definitely see where there is room for improvement.

Yes, the current focus is on discouraging TMTO (unless it's specifically
desired in a given use case), not on making it impossible (even if that
could be done).
---

Once again, sorry for stating the (maybe) obvious.

Alexander