[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140826213818.GA18588@openwall.com>
Date: Wed, 27 Aug 2014 01:38:18 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Argon is highly parallelizable...
On Tue, Aug 26, 2014 at 02:18:29PM -0400, Bill Cox wrote:
> I had not realized that the Argon authors claimed that t_cost == 3 is
> the minimum safe number of rounds for 1GiB, and for 10MiB, it was 236.
> Running the benchmarks again with these numbers:
>
> 10MiB case:
>
> Linux-AES-NI> time Argon-Optimized -taglength 32 -logmcost 10 -tcost
> 234 -pwdlen 64 -saltlen 16 -threads 5
> Memory allocated: 1 MBytes, 5 threads
> Argon: 346.15 cpb 338.04 Mcycles 0.3923 seconds
>
> real 0m0.106s
> user 0m0.347s
> sys 0m0.047s
You said "for 10MiB, it was 236", but then benchmarked 1 MiB and 234?
This sort of TMTO safety margin surely makes Argon look relatively
useless for user authentication either way. We should be talking
hundreds or thousands of authentication attempts per second at this
memory usage per hash.
BTW, why 5 threads? Was it the fastest choice in your testing?
> 1GiB case:
>
> Linux-AES-NI> time Argon-Optimized -taglength 32 -logmcost 20 -tcost 3
> -pwdlen 64 -saltlen 16 -threads 8
> Memory allocated: 1024 MBytes, 8 threads
> Argon: 13.25 cpb 13250.55 Mcycles 18.4738 seconds
>
> real 0m4.097s
> user 0m18.329s
> sys 0m0.146s
This isn't nearly as bad, but is ~16x slower than yescrypt t=0,
including memory allocation overhead (and more than that if we don't
count the overhead). With modified phc-test to test just 1 GiB, via the
slow interface with memory (de)allocation overhead on each try:
solar@...l:~/yescrypt/t$ time ./phc-test >/dev/null
m_cost=17 (1048576 KiB), t_cost=0
4 c/s real, 0 c/s virtual (258 hashes in 63.78 seconds)
real 1m3.778s
user 5m38.425s
sys 1m24.073s
Core i7-4770K, 8 threads.
Without YESCRYPT_PARALLEL_SMIX, so with scrypt style parallelism, it's
almost the same speed:
solar@...l:~/yescrypt/t$ time ./phc-test >/dev/null
m_cost=14 (131072 KiB), t_cost=0
3 c/s real, 0 c/s virtual (258 hashes in 64.58 seconds)
real 1m4.579s
user 5m44.078s
sys 1m24.225s
This is actually 1 GiB too, as 128 MiB times 8 threads with their
separate regions (scrypt style). This uses more memory bandwidth, and
is more TMTO resilient within each of the 128 MiB regions (if this
notion even makes sense), but allow for the obvious free 8x TMTO for
each instance via sequential rather than 8x parallel computation.
Either way, it's approx. 4 yescrypt/second vs. one Argon per 4 seconds,
so 16x faster, at 1 GiB including (de)allocation overhead. I assume
that i7-4770K is about as fast as Bill's i7-3770. I am not using AVX2
in these tests.
Alexander
Powered by blists - more mailing lists