phc-discussions - Re: [PHC] Lyra2 initial review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140417155655.GA9178@openwall.com>
Date: Thu, 17 Apr 2014 19:56:55 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Lyra2 initial review

On Thu, Apr 17, 2014 at 11:08:25AM -0400, Bill Cox wrote:
> Lyra2:
> Peak memory/second: 1.63 GiB/s
> Average memory/second: 1.22 GiB/s
> Memory bandwidth: 11.4 GiB/s  (highly bandwidth limited)
> 
> Yescript:
> Peak memory/second: 3.03 GiB/s
> Average memory/second: 2.27 GiB/s
> Memory bandwidth: 6.06 GiB/s (does it do 2 r/w to DRAM per location, or 3?)

yescrypt does 4 r/w per location when YESCRYPT_RW is set (and it is set
in the PHS() API), so this would be 12.12 GiB/s.  It does one random
read and one sequential write per iteration in SMix1, and one random
read+write per iteration in SMix2.  (Resetting, YESCRYPT_RW brings it to
only 2 r/w per location like in normal scrypt, but then it'd have to run
1.5 times longer to achieve optimal normalized area-time and it'd be
friendly to TMTO, hence that mode isn't the default.)

> TwoCats:
> Peak memory/second: 4.98 GiB/s
> Average memory/second: 2.49 GiB/s
> Memory bandwidth: 9.97 GiB/s

I'd interpret these as TwoCats and yescrypt winning over Lyra2 in terms
of average memory/second, which I think is the most important of three
metrics here.  yescrypt loses to TwoCats in terms of this metric by 8.9%,
but it possibly makes up for that by using 21.6% more memory bandwidth
(some would view that as a drawback, but in this test we were specifically
trying to use as much bandwidth from one thread as we could).

BTW, chances are that going from r=8 to r=16 (already a tunable, like in
scrypt) would increase yescrypt's bandwidth usage (and average and peak
memory usage per second) some further.  I choose r=8 for PHS() because
it increases reliance on fast small random lookups.  r=32 and higher
might be faster or slower, since we're competing for L1 cache space with
pwxform's S-boxes (8 KiB in these tests).

> I ran the same tests on Alexander's Sandy Bridge server, which has more
> memory channels, I think.  For this one-thread case, the results scaled
> almost exactly from the results above, just a bit slower.  On Alexander's
> Haswell machine, Lyra2 catches up a bit, which surprised me, because I have
> the impression that the memory on this machine is slower than the memory on
> mine.  Here's the raw data for Alexander's Haswell box:

I think having more than 2 memory channels is only helpful when you run
more than 1 thread.  As to Ivy Bridge vs. Haswell, I think your earlier
guess about the memory types affecting results significantly might have
been wrong.  They're DDR3-1600 in both cases, right? just with slightly
different timings.  I think Ivy Bridge vs. Haswell itself probably makes
more of a difference, and it is quite possible that Haswell is slower on
some benchmarks.

> That's all I have for now!

Thanks!

Alexander