[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150429132944.GA17387@openwall.com>
Date: Wed, 29 Apr 2015 16:29:44 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Argon2 modulo division
On Tue, Apr 21, 2015 at 06:26:31AM +0300, Solar Designer wrote:
> It appears that Argon2 uses modulo division with arbitrary (and
> changing) divisors (usually not powers of 2). Argon2d applies this to
> secret-dependent integers. This is an extra source of timing leaks, on
> top of secret-dependent lookup addresses.
>
> Do we consider this a drawback?
>
> TwoCats had this too, but I avoided this maybe-drawback in yescrypt.
Another aspect of Argon2's approach, and an aspect that both yescrypt
and TwoCats tried to address in different ways, is the highly
non-uniform memory access pattern.
I've just analyzed the block indices from 1 GB runs of Argon2d and
yescrypt, both at lowest supported t_cost. In yescrypt's case, I
excluded from analysis its initial 1/64 m_cost (thus, 16 MB)
sub-invocation, which it does to mitigate garbage collector attacks.
(If included, it'd obviously show a higher hit rate to the first 16 MB
portion. But that's only 1/65 of the running time.)
For Argon2d, I got 1048573 total indices, of which 524095 are unique
(that's about 50% as expected for its algorithm). Here's the histogram
of access count by memory region (1 GB split into 50 regions). (I hope
your mail readers don't mangle it too badly.)
9.81%
O
O
O
O
O
Oo
OO
OOo
OOO
OOOO
OOOOO
OOOOOOo
OOOOOOOOo
OOOOOOOOOOo
OOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOooo
OOOOOOOOOOOOOOOOOOOOOOOOOOOoooo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooooo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoooooo
As you can see, almost 10% of accesses go to the first 2% of memory.
Here's a smaller, 4 column 10 row histogram showing distribution by
quarters of the total memory:
59.67%
O
O
O
O
O
Oo
OO
OO
OOO
OOOO
As you can see, almost 60% of accesses go to the first quarter. And for
halves of the total memory we get:
84.71%
O
O
O
O
OO
So attackers have fairly strong incentive to provide faster memory for
lower block indices and slower, cheaper, or/and more distant memory for
higher indices.
For yescrypt, I got 1398100 total indices (723841 unique, or 69% of the
2^20 range), including 1048574 (595780 unique, or 56.8%) in SMix1 and
349526 (297265 unique, or 85% of these or 28.3% of the 2^20 range) in
SMix2 (running for 1/3 of N).
For SMix1 alone, which is most directly comparable to the Argon2d run
above, the histograms are:
3.96%
OOo
OOOOo
OOOOOOOo
OOOOOOOOOo
OOOOOOOOOOOOo
OOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
43.73%
O
O
Oo
OO
OO
OOo
OOO
OOO
OOOo
OOOO
74.99%
O
O
O
OO
OO
As you can see, there's also significant bias towards lower addresses,
but it's not as bad as in Argon2d. (And from the algorithm we know that
it actually achieves uniform access across the smaller sliding window,
which gives us a lower bound on the achieved non-TMTO area-time product.)
For example, the hit rate for the first 2% of memory is reduced from
9.81% for Argon2d to 3.96% for yescrypt's SMix1. For memory halves,
it's 84.71% and 15.29% for Argon2d, vs. 75% and 25% for yescrypt SMix1.
For the sake of completeness, here are histograms for yescrypt's SMix2:
2.04%
OoOOOOOOOOoOOOOOoOoOOOOOOOOoOOooOoOo oOOOOOOoOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
and for SMix1 and SMix2 combined:
3.48%
OOoo
OOOOOo
OOOOOOOoo
OOOOOOOOOOoo
OOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Indeed, for SMix2 and for higher t_cost uniform access becomes easy, in
Argon2 too. It's the initial memory filling where non-trivial tradeoffs
exist.
Dmitry, maybe you'd consider adopting yescrypt's Wrap() or TwoCats'
"cubed distribution" for Argon2? You've already seen that Wrap()
improves TMTO resistance, too.
Bill, you could want to show how TwoCats behaves in this respect.
I guess it'd win over yescrypt, since I guess you specifically chose
the "cubed distribution" to achieve an overall closer to uniform
distribution, whereas I focused on achieving exactly uniform
distribution for the smaller sliding window, increasing the number of
unique indices (the 56.8% figure above, vs. Argon2d's 50%), as well as
avoiding the modulo division for reasons stated before. ... or did you
tune for higher TMTO resistance rather than for uniformity per se?
Alexander
View attachment "modan.pl" of type "text/plain" (405 bytes)
Powered by blists - more mailing lists