[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALW8-7J7Sc8T5iL+8SVstqe_n8-W29DCC71T+w7=tMDMpvj=Kw@mail.gmail.com>
Date: Fri, 3 Apr 2015 21:09:14 +0200
From: Dmitry Khovratovich <khovratovich@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Re: [PHC] yescrypt throughput vs. PWXrounds
Alexander,
I also observed similar behaviour for Argon2, though not for that
extreme. When changing the number of Blake2b rounds from 2 to 10 I got
only 70% decrease in speed for 8 threads. This suggests that the
current performance is much more bandwidth- rather than
computation-bound. It also suggests that the computation hardening can
be increased at user's will with relatively little performance
penalty.
Could you also clarify how you measure "GPU-unfriendliness"?
Dmitry
On Fri, Apr 3, 2015 at 8:05 PM, Solar Designer <solar@...nwall.com> wrote:
> Bill, all -
>
> FWIW, here's what I am getting on FX-8120 with 2x DDR3-1600 (I should
> probably re-do this on more machines). The first column is number of
> rounds of pwxform (the current default is 6), followed by throughput in
> hashes/second for 8 threads / 1 thread for 2 MB, 128 MB, and 2 MB RAM +
> 2 GB ROM. For the multi-thread throughput figures, the threads are
> independent (simulating an authentication server) and the total amount
> of RAM is what's shown times the number of threads (so 1 GB for the
> 8-thread tests in the 128 MB column).
>
> rounds 2 MB 128 MB 2 MB + 2 GB ROM
> 6 2772 / 511 30 / 7 2592 / 486
> 4 3653 / 691 32 / 9 3269 / 647
> 2 5340 / 1077 33 / 13 4288 / 974
> 1 6454 / 1451 33 / 15 4760 / 1255
>
> As you can see, when using only RAM and being out of cache and running
> as many threads as the hardware supports (8 on this CPU), there's only
> a 10% speedup possible from reducing PWXrounds from 6 to 1. OTOH, when
> the machine is under-loaded, running only 1 thread, there's a 2x+
> speedup possible (7 to 15 hashes/second in 1 thread). I optimized for
> best behavior when server capacity is reached (because that's what
> limits the cost settings), as well as for multi-threaded KDF use. For
> this, the choice of 6 rounds still looks good to me. BTW, looking at
> these numbers another way, it's 3 GB memory filled (and 8 GB of
> bandwidth used) in 1 second, despite of the high PWXrounds setting.
> This can be improved to 3.3 GB (and 9 GB bandwidth usage). Worth it?
> I'd rather opt for the 10% lower memory and bandwidth usage figure, but
> gain diversity of defense (3x or 6x higher compute hardening).
>
> When much of the RAM portion fits in a cache, there's significant
> speedup from lower PWXrounds, even when running 8 threads. However, the
> speedup is not enough to keep the compute hardening per time the same.
> For example, 2772*6 / (3653*4) = 1.14, but 6/4 = 1.5, and
> 2772*6 / (5340*2) = 1.56, but 6/2 = 3. So going for PWXrounds = 2 would
> halve the compute hardening per time. Maybe that's OK, but I wouldn't
> be able to claim that yescrypt achieves bcrypt-like frequency(*) of its
> S-box lookups and thus is at least as GPU-unfriendly as bcrypt even at
> the lowest m_cost settings. Would being no more than 2x worse than
> bcrypt still be OK? I'm not sure. I would be uncomfortable about that,
> even though bcrypt isn't one of the PHC finalists. ;-)
>
> (*) Also considered are parallelism of the S-box lookups and total size
> of the S-boxes.
>
> Should we have PWXrounds (auto-)tuned differently for the
> single-threaded case? With password hashing use, yescrypt being invoked
> with p=1 doesn't mean there isn't another instance running concurrently.
> In fact, in terms of capacity planning we should assume that there are
> as many such instances as the hardware supports. Should we have some
> kind of heuristics (or a flag?) to determine KDF use (e.g., size of
> 512 MB or more?), and if p=1 then reduce PWXrounds? This feels like too
> much complexity and unexpected behavior, and yescrypt is too complex as
> it is.
>
> While I don't mind auto-tuning of PWXgather and PWXsimple for the
> current machine (and getting them encoded along with the hashes or e.g.
> with the encrypted filesystem), auto-tuning of PWXrounds is different
> (will vary by other yescrypt parameters and expected system load, rather
> than only by underlying CPU).
>
> Alexander
--
Best regards,
Dmitry Khovratovich
Powered by blists - more mailing lists