phc-discussions - Re: [PHC] Argon2 modulo division

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150429132944.GA17387@openwall.com>
Date: Wed, 29 Apr 2015 16:29:44 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Argon2 modulo division

On Tue, Apr 21, 2015 at 06:26:31AM +0300, Solar Designer wrote:
> It appears that Argon2 uses modulo division with arbitrary (and
> changing) divisors (usually not powers of 2).  Argon2d applies this to
> secret-dependent integers.  This is an extra source of timing leaks, on
> top of secret-dependent lookup addresses.
> 
> Do we consider this a drawback?
> 
> TwoCats had this too, but I avoided this maybe-drawback in yescrypt.

Another aspect of Argon2's approach, and an aspect that both yescrypt
and TwoCats tried to address in different ways, is the highly
non-uniform memory access pattern.

I've just analyzed the block indices from 1 GB runs of Argon2d and
yescrypt, both at lowest supported t_cost.  In yescrypt's case, I
excluded from analysis its initial 1/64 m_cost (thus, 16 MB)
sub-invocation, which it does to mitigate garbage collector attacks.
(If included, it'd obviously show a higher hit rate to the first 16 MB
portion.  But that's only 1/65 of the running time.)

For Argon2d, I got 1048573 total indices, of which 524095 are unique
(that's about 50% as expected for its algorithm).  Here's the histogram
of access count by memory region (1 GB split into 50 regions).  (I hope
your mail readers don't mangle it too badly.)

9.81%
O                                                 
O                                                 
O                                                 
O                                                 
O                                                 
Oo                                                
OO                                                
OOo                                               
OOO                                               
OOOO                                              
OOOOO                                             
OOOOOOo                                           
OOOOOOOOo                                         
OOOOOOOOOOo                                       
OOOOOOOOOOOOOoo                                   
OOOOOOOOOOOOOOOOOoo                               
OOOOOOOOOOOOOOOOOOOOOooo                          
OOOOOOOOOOOOOOOOOOOOOOOOOOOoooo                   
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooooo           
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoooooo

As you can see, almost 10% of accesses go to the first 2% of memory.
Here's a smaller, 4 column 10 row histogram showing distribution by
quarters of the total memory:

59.67%
O   
O   
O   
O   
O   
Oo  
OO  
OO  
OOO 
OOOO

As you can see, almost 60% of accesses go to the first quarter.  And for
halves of the total memory we get:

84.71%
O 
O 
O 
O 
OO

So attackers have fairly strong incentive to provide faster memory for
lower block indices and slower, cheaper, or/and more distant memory for
higher indices.

For yescrypt, I got 1398100 total indices (723841 unique, or 69% of the
2^20 range), including 1048574 (595780 unique, or 56.8%) in SMix1 and
349526 (297265 unique, or 85% of these or 28.3% of the 2^20 range) in
SMix2 (running for 1/3 of N).

For SMix1 alone, which is most directly comparable to the Argon2d run
above, the histograms are:

3.96%
OOo                                               
OOOOo                                             
OOOOOOOo                                          
OOOOOOOOOo                                        
OOOOOOOOOOOOo                                     
OOOOOOOOOOOOOOo                                   
OOOOOOOOOOOOOOOOO                                 
OOOOOOOOOOOOOOOOOOOoo                             
OOOOOOOOOOOOOOOOOOOOOoo                           
OOOOOOOOOOOOOOOOOOOOOOOOo                         
OOOOOOOOOOOOOOOOOOOOOOOOOOoo                      
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOo                    
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo                 
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo               
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo            
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo          
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo        
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo     
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo  
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo

43.73%
O   
O   
Oo  
OO  
OO  
OOo 
OOO 
OOO 
OOOo
OOOO

74.99%
O 
O 
O 
OO
OO

As you can see, there's also significant bias towards lower addresses,
but it's not as bad as in Argon2d.  (And from the algorithm we know that
it actually achieves uniform access across the smaller sliding window,
which gives us a lower bound on the achieved non-TMTO area-time product.)

For example, the hit rate for the first 2% of memory is reduced from
9.81% for Argon2d to 3.96% for yescrypt's SMix1.  For memory halves,
it's 84.71% and 15.29% for Argon2d, vs. 75% and 25% for yescrypt SMix1.

For the sake of completeness, here are histograms for yescrypt's SMix2:

2.04%
OoOOOOOOOOoOOOOOoOoOOOOOOOOoOOooOoOo oOOOOOOoOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

and for SMix1 and SMix2 combined:

3.48%
OOoo                                              
OOOOOo                                            
OOOOOOOoo                                         
OOOOOOOOOOoo                                      
OOOOOOOOOOOOOoo                                   
OOOOOOOOOOOOOOOOo                                 
OOOOOOOOOOOOOOOOOOOoo                             
OOOOOOOOOOOOOOOOOOOOOOoo                          
OOOOOOOOOOOOOOOOOOOOOOOOOoo                       
OOOOOOOOOOOOOOOOOOOOOOOOOOOOo                     
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo                  
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo               
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo            
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo         
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo      
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo   
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOoo
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

Indeed, for SMix2 and for higher t_cost uniform access becomes easy, in
Argon2 too.  It's the initial memory filling where non-trivial tradeoffs
exist.

Dmitry, maybe you'd consider adopting yescrypt's Wrap() or TwoCats'
"cubed distribution" for Argon2?  You've already seen that Wrap()
improves TMTO resistance, too.

Bill, you could want to show how TwoCats behaves in this respect.
I guess it'd win over yescrypt, since I guess you specifically chose
the "cubed distribution" to achieve an overall closer to uniform
distribution, whereas I focused on achieving exactly uniform
distribution for the smaller sliding window, increasing the number of
unique indices (the 56.8% figure above, vs. Argon2d's 50%), as well as
avoiding the modulo division for reasons stated before.  ... or did you
tune for higher TMTO resistance rather than for uniformity per se?

Alexander

View attachment "modan.pl" of type "text/plain" (405 bytes)