lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 30 Apr 2015 04:39:34 -0700
From: Bill Cox <waywardgeek@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Re: [PHC] Added multi-threading support to test suite

On Wed, Apr 29, 2015 at 9:37 PM, Solar Designer <solar@...nwall.com> wrote:

> On Wed, Apr 29, 2015 at 12:15:41PM -0700, Bill Cox wrote:
> > Algorithm          Speed (in ms)
> > --------------------------------
> > Argon2d-sse          151
> > Yescrypt-2pw-sse     160
> > Yescrypt-sse         175
> > Lyra2-sse            258
> > Argon               1620
> >
> > All but Argon are memory-bandwidth limiited.  Argon is external
> cache-miss
> > penalty limited, and is not well suited as an Scrypt upgrade (it would
> be a
> > downgrade, IMO).  However, since the PHC panel has not yet determined
> > whether to allow Argon2 into the competition, I've included Argon's
> > performance here.  Hopefully, this adds some support for allowing Argon2.
> >
> > Argon2d, Yescrypt, and Lyra2 all provide excellent defense, IMO.  I think
> > the best defensive runs are, in order of defense:
> >
> > Yescrypt-2pw-sse with 4 threads, hashing 1GiB in 167ms
> > Yescrypt-sse with 12 threads, hashing 1GiB in 175ms
> > Argon2d-sse with 8 threads, hashing 1GiB in 155ms
> > Lyra2-sse with 4 threads, hashing 1GiB in 218ms
> >
> > If I understand correctly, the 2-round Yescrypt-2pw-sse run is slightly
> > more compute-time hardened than the 6-round Yescrypt-sse run.  The
> 6-round
> > version does make better use of all 6 of my CPU cores, but I do not think
> > an attacker will be very computation core limited.  I would rather just
> use
> > 4 cores and get better runtime and compute-time hardening.
>
> This makes sense, but on the other hand:
>
> 1. If we just set PWXrounds=2, this means that people who will run 12
> threads on a machine like yours will get almost 3x worse compute-time
> hardening defense than they do now.  (160 ms, 167 ms, and 175 ms are
> similar, so I am primarily looking at other differences.)  We can't
> expect apps and users to always tune for optimal number of threads.
> And on servers, request rate capacity is decided by what happens at
> highest load.
>

Getting the thread count right is pretty imporant.  The single-thread case
is going to be very common, probably followed by using the number of cores
(or cores - 1).  In the case of 12 threads, that's twice the number of
cores on my machine.  Hyper-threading is nice, but those last 6 threads
give an attacker more parallelism than is warranted, IMO.


> 2. If it weren't for the limited memory bandwidth of the machine, your
> 4-thread run would be more susceptible to CPU attacks.  (As it is, it's
> only very slightly more susceptible, as seen from the 167 ms vs. 160 ms
> (non-)difference.)  If this is later attacked on a bigger machine (with
> more memory channels), I'd expect attacks on the 4-thread, 2-round
> version to run much faster than on the 12-thread, 6-round version.  Both
> use roughly the same memory bandwidth on your current machine, but the
> 4-thread version would leave more of the new machine's CPUs available to
> take advantage of that machine's greater memory bandwidth.
>

I agree.  The ideal case is to tune the PWX rounds to the machine.
However, I would hate to see Yescrypt lose this competition simply because
of the default 6 rounds.  Most of us doing benchmarks have common desktops
with typically only 2 memory banks.  This should be the default target, IMO.

3. The 175 ms vs. 167 ms difference is negligible (and the extra 3x
> parallelism is compensated for by the 3x increase in compute-time
> hardening per thread).  I think it's fair price for #1 and #2 above.
>

Fair enough.  I think the difference is quite small in terms of real
defense, and I could argue either way.  Howewver, no one but me has been
posting multi-thread benchmarks.  Yescrypt's single-thread defaults are
what Milan is showing in every chart.  Yescrypt-2pw-sse looks a lot more
competitive in those charts.


> That said, I hear you and I am considering lowering the default
> PWXrounds or/and making it runtime tunable.  (OTOH, the latter goes
> against simplicity.  So probably not in yescrypt-lite.)


I definately prefer the 2-round default for Yescrypt-lite.  This will be an
algorithm most likely run on one thread.  I do not think it should have a
multi-threading capability.


>
> > I rate Argon2d-sse after Yescrypt-2pw-sse and Yescrypt-sse for poorer
> > compute-time hardening and GPU defense, and Lyra2-sse after Argon2-sse
> for
> > it's longer runtime, since memory*time defense goes as the square of the
> > memory hashing speed.
>
> Yeah.  To be fair, yescrypt's GPU defense is important at way lower
> m_cost.  At 1 GB, it's not required, except that it changes the
> compute-time hardening from MUL latency to max(MUL, LUT) latency.


I prefer Yescrypt at large memory sizes, too, for improved compute-time
hardening.  Argon2d is speed-limited by it's 16 sequential Blake2b rounds.
An ASIC is likely to speed that up by 20X, and that bothers me a lot.
Improved GPU defense at low memory is just one example of the enhancements
you've put into Yescrypt.  It's the most universal algorithm in the
competition.

> ---------
> >
> > $ ./tst-lyra2-sse -p1 -m43700 -t1
> > 8 32 43700 1 680 1044960
> >
> > $ ./tst-lyra2-sse -p2 -m43700 -t1
> > 8 32 43700 1 345 1044952
> >
> > $ ./tst-lyra2-sse -p4 -m43700 -t1
> > 8 32 43700 1 218 1045024
> >
> > $ ./tst-lyra2-sse -p8 -m43700 -t1
> > 8 32 43700 1 235 1045000
> >
> > $ ./tst-lyra2-sse -p12 -m43700 -t1
> > 8 32 43700 1 258 1044620
>
> The slowdown seen here when going from 4 to 8 or 12 threads is nasty,
> especially on servers.  This, too, is something I tried to avoid when
> not setting PWXrounds lower.
>

Your right, but I think this is mostly a problem for dedicated authecation
servers, not general machines in data centers.  I doubt you'll get more
than one thread on generic data center hardware, so you'd be better off
with 2 rounds in Yescrypt.

Bill

Content of type "text/html" skipped

Powered by blists - more mailing lists