[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOLP8p4n2KO1rC7G-Vc4Q8t5Ob280OL1dPmgRKX3k7=nnKriAw@mail.gmail.com>
Date: Thu, 17 Apr 2014 15:32:25 -0400
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Lyra2 initial review
On Thu, Apr 17, 2014 at 12:21 PM, Solar Designer <solar@...nwall.com> wrote:
> Bill,
>
> How did you calculate the "average memory/second" for Lyra2 and
> yescrypt. I don't know about Lyra2's, but yescrypt's looks wrong to me.
>
> I think it is 5/8 of peak, since memory usage is growing linearly for
> the first 3/4th of time, and then stays at peak for 1/4th.
You're right. I skimmed the code, and I saw:
/* 6: for i = 0 to N - 1 do */
for (i = 0; i < Nloop; i += 2) {
...
The comment made me think it would execute N-1 times, but that's just the
Script comment. I went back and read your documentation on Nloop in your
writeup, and it's clear enough. I just missed it before.
BTW, I think having a partial second loop is an excellent idea. Your use
of t_cost in this way wins. It's even better than just having an outer
loop like garlic, without increasing memory. I see Lyra2 does something
similar. In both Lyra2 and Yescript, TMTO resistance increases with
t_cost. Very nice.
> On Thu, Apr 17, 2014 at 07:56:55PM +0400, Solar Designer wrote:
> > On Thu, Apr 17, 2014 at 11:08:25AM -0400, Bill Cox wrote:
> > > Lyra2:
> > > Peak memory/second: 1.63 GiB/s
> > > Average memory/second: 1.22 GiB/s
> > > Memory bandwidth: 11.4 GiB/s (highly bandwidth limited)
> > >
> > > Yescript:
> > > Peak memory/second: 3.03 GiB/s
> > > Average memory/second: 2.27 GiB/s
>
> This would be: 3.03*5/8 = 1.89 GiB/s
I agree.
> > > Memory bandwidth: 6.06 GiB/s (does it do 2 r/w to DRAM per location,
> or 3?)
> >
> > yescrypt does 4 r/w per location when YESCRYPT_RW is set (and it is set
>
> No, that's wrong: it's 2 r/w per iteration. It would be 4 r/w per
> location if SMix2 ran for as many iterations for SMix1, but in this mode
> it does not. I think correct is:
>
> 3.03 * 4/3 * 2 = 8.08 GiB/s
>
> 4/3 is ratio of number of iterations to number of locations
> 2 is number of r/w per iteration
You're math looks right here as well. As you pointed out above, your real
bandwidth is higher because of the extra time spent on memory allocation,
and that has the effect of giving Lyra2 a higher bandwidth relative to
Yescrypt and TwoCats than it really has. I'd have to do some more invasive
hacks to get a better estimate.
> > in the PHS() API), so this would be 12.12 GiB/s. It does one random
> > read and one sequential write per iteration in SMix1, and one random
> > read+write per iteration in SMix2. (Resetting, YESCRYPT_RW brings it to
> > only 2 r/w per location like in normal scrypt, but then it'd have to run
> > 1.5 times longer to achieve optimal normalized area-time and it'd be
> > friendly to TMTO, hence that mode isn't the default.)
> >
> > > TwoCats:
> > > Peak memory/second: 4.98 GiB/s
> > > Average memory/second: 2.49 GiB/s
> > > Memory bandwidth: 9.97 GiB/s
> >
> > I'd interpret these as TwoCats and yescrypt winning over Lyra2 in terms
> > of average memory/second, which I think is the most important of three
> > metrics here. yescrypt loses to TwoCats in terms of this metric by 8.9%,
> > but it possibly makes up for that by using 21.6% more memory bandwidth
> > (some would view that as a drawback, but in this test we were
> specifically
> > trying to use as much bandwidth from one thread as we could).
>
> The 8.9% was based on your "average memory/second" figures. With
> yescrypt's corrected, TwoCats' advantage is 24%, which is impressive!
>
> I am surprised that Lyra2 and TwoCats seem to win significantly in terms
> of bandwidth usage. I doubt it's the cost of one round of pwxform
> (although to find out we can try making it zero rounds, even). It could
> be the cost of r=8.
>
> > BTW, chances are that going from r=8 to r=16 (already a tunable, like in
> > scrypt) would increase yescrypt's bandwidth usage (and average and peak
> > memory usage per second) some further. I choose r=8 for PHS() because
> > it increases reliance on fast small random lookups. r=32 and higher
> > might be faster or slower, since we're competing for L1 cache space with
> > pwxform's S-boxes (8 KiB in these tests).
>
> Alexander
>
I tried r=32, and here are the results:
time ./phs-yescrypt 0 16
Allocating 2147504128 bytes
2f c7 c0 a0 1b 0c 6b 82
61 4d 6b c0 02 cb 16 d7
6c 40 2b 60 2b 9b 23 8c
d2 32 7c a3 56 b4 ab 25 32 (octets)
real 0m0.648s
user 0m0.540s
sys 0m0.100s
Using the corrected math you provided, I think the correct numbers are:
Lyra2:
Peak memory/second: 1.52 GiB/s
Average memory/second: 1.14 GiB/s
Memory bandwidth: 10.64 GiB/s
Yescrypt: Salsa20/2, 1 PWXFORM round, r=32
Peak memory/second: 3.09 GiB/s
Average memory/second: 1.93 GiB/s
Memory bandwidth: 8.23 GiB/s
TwoCats: multiplies=1
Peak memory/second: 4.64 GiB/s
Average memory/second: 2.23 GiB/s
Memory bandwidth: 9.28 GiB/s
I agree that average memory*time is the most important metric. TwoCats is
only ahead by about 15% on that scale. If I want more compute time
hardening, I should set multiplies to 2, in which case I'd probably run
slower. In any case, my machine changes it's mind on runtimes by 15%
depending on it's mood it seems, so we're getting down to a level of speed
improvement that even I can't get excited about.
I'm still sticking to my "alternate win condition" based on peak memory *
time :-)
Bill
Content of type "text/html" skipped
Powered by blists - more mailing lists