phc-discussions - Re: [PHC] Argon2 modulo division

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALW8-7KU=MMutw6y3YdNa7d27_bkv54-bkR12NdxJ=gf+mWQYQ@mail.gmail.com>
Date: Wed, 29 Apr 2015 15:51:01 +0200
From: Dmitry Khovratovich <khovratovich@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Re: [PHC] Argon2 modulo division

HI Alexander,

thank you for the detailed analysis.

We indeed have considered changing the addressing in Argon2d/i  (I
think I mentioned it in an earlier e-mail) after we analyzed
multi-pass TMTO resistance.

However, the uniformity would not be the goal, but rather the higher
computational/depth penalties for an attacker.

All the tradeoff methods I have tried accumulated highest penalties in
latest blocks. Therefore, the latest blocks should be asked rather
more often - but not always (sliding window), as then the attacker
would not store the first blocks at all.

So if W(i) is the expected penalty for i-th block, the probability to
request it should be proportional to 1/W(i).  The problem is that the
computational penalty C(i) and depth penalty D(i) grow differently
(the latter almost linearly), so only one of penalty types can be
maximized that way.

We planned to have a small report on this idea and subsequent
modification to Argon2, but other deadlines have pressured on us
recently.

Dmitry

On Wed, Apr 29, 2015 at 3:29 PM, Solar Designer <solar@...nwall.com> wrote:
> On Tue, Apr 21, 2015 at 06:26:31AM +0300, Solar Designer wrote:
>> It appears that Argon2 uses modulo division with arbitrary (and
>> changing) divisors (usually not powers of 2).  Argon2d applies this to
>> secret-dependent integers.  This is an extra source of timing leaks, on
>> top of secret-dependent lookup addresses.
>>
>> Do we consider this a drawback?
>>
>> TwoCats had this too, but I avoided this maybe-drawback in yescrypt.
>
> Another aspect of Argon2's approach, and an aspect that both yescrypt
> and TwoCats tried to address in different ways, is the highly
> non-uniform memory access pattern.
>
> I've just analyzed the block indices from 1 GB runs of Argon2d and
> yescrypt, both at lowest supported t_cost.  In yescrypt's case, I
> excluded from analysis its initial 1/64 m_cost (thus, 16 MB)
> sub-invocation, which it does to mitigate garbage collector attacks.
> (If included, it'd obviously show a higher hit rate to the first 16 MB
> portion.  But that's only 1/65 of the running time.)
>
> For Argon2d, I got 1048573 total indices, of which 524095 are unique
> (that's about 50% as expected for its algorithm).  Here's the histogram
> of access count by memory region (1 GB split into 50 regions).  (I hope
> your mail readers don't mangle it too badly.)
>
> 9.81%
> O
> O
> O
> O
> O
> Oo
> OO
> OOo
> OOO
> OOOO
> OOOOO
> OOOOOOo
> OOOOOOOOo
> OOOOOOOOOOo
> OOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOooo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOoooo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooooo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoooooo
>
> As you can see, almost 10% of accesses go to the first 2% of memory.
> Here's a smaller, 4 column 10 row histogram showing distribution by
> quarters of the total memory:
>
> 59.67%
> O
> O
> O
> O
> O
> Oo
> OO
> OO
> OOO
> OOOO
>
> As you can see, almost 60% of accesses go to the first quarter.  And for
> halves of the total memory we get:
>
> 84.71%
> O
> O
> O
> O
> OO
>
> So attackers have fairly strong incentive to provide faster memory for
> lower block indices and slower, cheaper, or/and more distant memory for
> higher indices.
>
> For yescrypt, I got 1398100 total indices (723841 unique, or 69% of the
> 2^20 range), including 1048574 (595780 unique, or 56.8%) in SMix1 and
> 349526 (297265 unique, or 85% of these or 28.3% of the 2^20 range) in
> SMix2 (running for 1/3 of N).
>
> For SMix1 alone, which is most directly comparable to the Argon2d run
> above, the histograms are:
>
> 3.96%
> OOo
> OOOOo
> OOOOOOOo
> OOOOOOOOOo
> OOOOOOOOOOOOo
> OOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
>
> 43.73%
> O
> O
> Oo
> OO
> OO
> OOo
> OOO
> OOO
> OOOo
> OOOO
>
> 74.99%
> O
> O
> O
> OO
> OO
>
> As you can see, there's also significant bias towards lower addresses,
> but it's not as bad as in Argon2d.  (And from the algorithm we know that
> it actually achieves uniform access across the smaller sliding window,
> which gives us a lower bound on the achieved non-TMTO area-time product.)
>
> For example, the hit rate for the first 2% of memory is reduced from
> 9.81% for Argon2d to 3.96% for yescrypt's SMix1.  For memory halves,
> it's 84.71% and 15.29% for Argon2d, vs. 75% and 25% for yescrypt SMix1.
>
> For the sake of completeness, here are histograms for yescrypt's SMix2:
>
> 2.04%
> OoOOOOOOOOoOOOOOoOoOOOOOOOOoOOooOoOo oOOOOOOoOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
>
> and for SMix1 and SMix2 combined:
>
> 3.48%
> OOoo
> OOOOOo
> OOOOOOOoo
> OOOOOOOOOOoo
> OOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
> OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
>
> Indeed, for SMix2 and for higher t_cost uniform access becomes easy, in
> Argon2 too.  It's the initial memory filling where non-trivial tradeoffs
> exist.
>
> Dmitry, maybe you'd consider adopting yescrypt's Wrap() or TwoCats'
> "cubed distribution" for Argon2?  You've already seen that Wrap()
> improves TMTO resistance, too.
>
> Bill, you could want to show how TwoCats behaves in this respect.
> I guess it'd win over yescrypt, since I guess you specifically chose
> the "cubed distribution" to achieve an overall closer to uniform
> distribution, whereas I focused on achieving exactly uniform
> distribution for the smaller sliding window, increasing the number of
> unique indices (the 56.8% figure above, vs. Argon2d's 50%), as well as
> avoiding the modulo division for reasons stated before.  ... or did you
> tune for higher TMTO resistance rather than for uniformity per se?
>
> Alexander



-- 
Best regards,
Dmitry Khovratovich