phc-discussions - Re: [PHC] Argon is highly parallelizable...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140826213818.GA18588@openwall.com>
Date: Wed, 27 Aug 2014 01:38:18 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Argon is highly parallelizable...

On Tue, Aug 26, 2014 at 02:18:29PM -0400, Bill Cox wrote:
> I had not realized that the Argon authors claimed that t_cost == 3 is
> the minimum safe number of rounds for 1GiB, and for 10MiB, it was 236.
>  Running the benchmarks again with these numbers:
> 
> 10MiB case:
> 
> Linux-AES-NI> time Argon-Optimized -taglength 32 -logmcost 10 -tcost
> 234 -pwdlen 64 -saltlen 16 -threads 5
> Memory allocated: 1 MBytes, 5 threads
> Argon:  346.15 cpb 338.04 Mcycles 0.3923 seconds
> 
> real	0m0.106s
> user	0m0.347s
> sys	0m0.047s

You said "for 10MiB, it was 236", but then benchmarked 1 MiB and 234?

This sort of TMTO safety margin surely makes Argon look relatively
useless for user authentication either way.  We should be talking
hundreds or thousands of authentication attempts per second at this
memory usage per hash.

BTW, why 5 threads?  Was it the fastest choice in your testing?

> 1GiB case:
> 
> Linux-AES-NI> time Argon-Optimized -taglength 32 -logmcost 20 -tcost 3
> -pwdlen 64 -saltlen 16 -threads 8
> Memory allocated: 1024 MBytes, 8 threads
> Argon:  13.25 cpb 13250.55 Mcycles 18.4738 seconds
> 
> real	0m4.097s
> user	0m18.329s
> sys	0m0.146s

This isn't nearly as bad, but is ~16x slower than yescrypt t=0,
including memory allocation overhead (and more than that if we don't
count the overhead).  With modified phc-test to test just 1 GiB, via the
slow interface with memory (de)allocation overhead on each try:

solar@...l:~/yescrypt/t$ time ./phc-test >/dev/null
m_cost=17 (1048576 KiB), t_cost=0
4 c/s real, 0 c/s virtual (258 hashes in 63.78 seconds)

real    1m3.778s
user    5m38.425s
sys     1m24.073s

Core i7-4770K, 8 threads.

Without YESCRYPT_PARALLEL_SMIX, so with scrypt style parallelism, it's
almost the same speed:

solar@...l:~/yescrypt/t$ time ./phc-test >/dev/null
m_cost=14 (131072 KiB), t_cost=0
3 c/s real, 0 c/s virtual (258 hashes in 64.58 seconds)

real    1m4.579s
user    5m44.078s
sys     1m24.225s

This is actually 1 GiB too, as 128 MiB times 8 threads with their
separate regions (scrypt style).  This uses more memory bandwidth, and
is more TMTO resilient within each of the 128 MiB regions (if this
notion even makes sense), but allow for the obvious free 8x TMTO for
each instance via sequential rather than 8x parallel computation.

Either way, it's approx. 4 yescrypt/second vs. one Argon per 4 seconds,
so 16x faster, at 1 GiB including (de)allocation overhead.  I assume
that i7-4770K is about as fast as Bill's i7-3770.  I am not using AVX2
in these tests.

Alexander