lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 30 Apr 2015 07:37:56 +0300
From: Solar Designer <>
Subject: Re: [PHC] Added multi-threading support to test suite

On Wed, Apr 29, 2015 at 12:15:41PM -0700, Bill Cox wrote:
> Algorithm          Speed (in ms)
> --------------------------------
> Argon2d-sse          151
> Yescrypt-2pw-sse     160
> Yescrypt-sse         175
> Lyra2-sse            258
> Argon               1620
> All but Argon are memory-bandwidth limiited.  Argon is external cache-miss
> penalty limited, and is not well suited as an Scrypt upgrade (it would be a
> downgrade, IMO).  However, since the PHC panel has not yet determined
> whether to allow Argon2 into the competition, I've included Argon's
> performance here.  Hopefully, this adds some support for allowing Argon2.
> Argon2d, Yescrypt, and Lyra2 all provide excellent defense, IMO.  I think
> the best defensive runs are, in order of defense:
> Yescrypt-2pw-sse with 4 threads, hashing 1GiB in 167ms
> Yescrypt-sse with 12 threads, hashing 1GiB in 175ms
> Argon2d-sse with 8 threads, hashing 1GiB in 155ms
> Lyra2-sse with 4 threads, hashing 1GiB in 218ms
> If I understand correctly, the 2-round Yescrypt-2pw-sse run is slightly
> more compute-time hardened than the 6-round Yescrypt-sse run.  The 6-round
> version does make better use of all 6 of my CPU cores, but I do not think
> an attacker will be very computation core limited.  I would rather just use
> 4 cores and get better runtime and compute-time hardening.

This makes sense, but on the other hand:

1. If we just set PWXrounds=2, this means that people who will run 12
threads on a machine like yours will get almost 3x worse compute-time
hardening defense than they do now.  (160 ms, 167 ms, and 175 ms are
similar, so I am primarily looking at other differences.)  We can't
expect apps and users to always tune for optimal number of threads.
And on servers, request rate capacity is decided by what happens at
highest load.

2. If it weren't for the limited memory bandwidth of the machine, your
4-thread run would be more susceptible to CPU attacks.  (As it is, it's
only very slightly more susceptible, as seen from the 167 ms vs. 160 ms
(non-)difference.)  If this is later attacked on a bigger machine (with
more memory channels), I'd expect attacks on the 4-thread, 2-round
version to run much faster than on the 12-thread, 6-round version.  Both
use roughly the same memory bandwidth on your current machine, but the
4-thread version would leave more of the new machine's CPUs available to
take advantage of that machine's greater memory bandwidth.

3. The 175 ms vs. 167 ms difference is negligible (and the extra 3x
parallelism is compensated for by the 3x increase in compute-time
hardening per thread).  I think it's fair price for #1 and #2 above.

That said, I hear you and I am considering lowering the default
PWXrounds or/and making it runtime tunable.  (OTOH, the latter goes
against simplicity.  So probably not in yescrypt-lite.)

> I rate Argon2d-sse after Yescrypt-2pw-sse and Yescrypt-sse for poorer
> compute-time hardening and GPU defense, and Lyra2-sse after Argon2-sse for
> it's longer runtime, since memory*time defense goes as the square of the
> memory hashing speed.

Yeah.  To be fair, yescrypt's GPU defense is important at way lower
m_cost.  At 1 GB, it's not required, except that it changes the
compute-time hardening from MUL latency to max(MUL, LUT) latency.

> Lyra2-sse
> ---------
> $ ./tst-lyra2-sse -p1 -m43700 -t1
> 8 32 43700 1 680 1044960
> $ ./tst-lyra2-sse -p2 -m43700 -t1
> 8 32 43700 1 345 1044952
> $ ./tst-lyra2-sse -p4 -m43700 -t1
> 8 32 43700 1 218 1045024
> $ ./tst-lyra2-sse -p8 -m43700 -t1
> 8 32 43700 1 235 1045000
> $ ./tst-lyra2-sse -p12 -m43700 -t1
> 8 32 43700 1 258 1044620

The slowdown seen here when going from 4 to 8 or 12 threads is nasty,
especially on servers.  This, too, is something I tried to avoid when
not setting PWXrounds lower.



Powered by blists - more mailing lists