phc-discussions - Re: [PHC] Lyra2 initial review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140417162133.GA10310@openwall.com>
Date: Thu, 17 Apr 2014 20:21:33 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Lyra2 initial review

Bill,

How did you calculate the "average memory/second" for Lyra2 and
yescrypt.  I don't know about Lyra2's, but yescrypt's looks wrong to me.

I think it is 5/8 of peak, since memory usage is growing linearly for
the first 3/4th of time, and then stays at peak for 1/4th.

On Thu, Apr 17, 2014 at 07:56:55PM +0400, Solar Designer wrote:
> On Thu, Apr 17, 2014 at 11:08:25AM -0400, Bill Cox wrote:
> > Lyra2:
> > Peak memory/second: 1.63 GiB/s
> > Average memory/second: 1.22 GiB/s
> > Memory bandwidth: 11.4 GiB/s  (highly bandwidth limited)
> > 
> > Yescript:
> > Peak memory/second: 3.03 GiB/s
> > Average memory/second: 2.27 GiB/s

This would be: 3.03*5/8 = 1.89 GiB/s

> > Memory bandwidth: 6.06 GiB/s (does it do 2 r/w to DRAM per location, or 3?)
> 
> yescrypt does 4 r/w per location when YESCRYPT_RW is set (and it is set

No, that's wrong: it's 2 r/w per iteration.  It would be 4 r/w per
location if SMix2 ran for as many iterations for SMix1, but in this mode
it does not.  I think correct is:

3.03 * 4/3 * 2 = 8.08 GiB/s

4/3 is ratio of number of iterations to number of locations
2 is number of r/w per iteration

> in the PHS() API), so this would be 12.12 GiB/s.  It does one random
> read and one sequential write per iteration in SMix1, and one random
> read+write per iteration in SMix2.  (Resetting, YESCRYPT_RW brings it to
> only 2 r/w per location like in normal scrypt, but then it'd have to run
> 1.5 times longer to achieve optimal normalized area-time and it'd be
> friendly to TMTO, hence that mode isn't the default.)
> 
> > TwoCats:
> > Peak memory/second: 4.98 GiB/s
> > Average memory/second: 2.49 GiB/s
> > Memory bandwidth: 9.97 GiB/s
> 
> I'd interpret these as TwoCats and yescrypt winning over Lyra2 in terms
> of average memory/second, which I think is the most important of three
> metrics here.  yescrypt loses to TwoCats in terms of this metric by 8.9%,
> but it possibly makes up for that by using 21.6% more memory bandwidth
> (some would view that as a drawback, but in this test we were specifically
> trying to use as much bandwidth from one thread as we could).

The 8.9% was based on your "average memory/second" figures.  With
yescrypt's corrected, TwoCats' advantage is 24%, which is impressive!

I am surprised that Lyra2 and TwoCats seem to win significantly in terms
of bandwidth usage.  I doubt it's the cost of one round of pwxform
(although to find out we can try making it zero rounds, even).  It could
be the cost of r=8.

> BTW, chances are that going from r=8 to r=16 (already a tunable, like in
> scrypt) would increase yescrypt's bandwidth usage (and average and peak
> memory usage per second) some further.  I choose r=8 for PHS() because
> it increases reliance on fast small random lookups.  r=32 and higher
> might be faster or slower, since we're competing for L1 cache space with
> pwxform's S-boxes (8 KiB in these tests).

Alexander