[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140417155655.GA9178@openwall.com>
Date: Thu, 17 Apr 2014 19:56:55 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Lyra2 initial review
On Thu, Apr 17, 2014 at 11:08:25AM -0400, Bill Cox wrote:
> Lyra2:
> Peak memory/second: 1.63 GiB/s
> Average memory/second: 1.22 GiB/s
> Memory bandwidth: 11.4 GiB/s (highly bandwidth limited)
>
> Yescript:
> Peak memory/second: 3.03 GiB/s
> Average memory/second: 2.27 GiB/s
> Memory bandwidth: 6.06 GiB/s (does it do 2 r/w to DRAM per location, or 3?)
yescrypt does 4 r/w per location when YESCRYPT_RW is set (and it is set
in the PHS() API), so this would be 12.12 GiB/s. It does one random
read and one sequential write per iteration in SMix1, and one random
read+write per iteration in SMix2. (Resetting, YESCRYPT_RW brings it to
only 2 r/w per location like in normal scrypt, but then it'd have to run
1.5 times longer to achieve optimal normalized area-time and it'd be
friendly to TMTO, hence that mode isn't the default.)
> TwoCats:
> Peak memory/second: 4.98 GiB/s
> Average memory/second: 2.49 GiB/s
> Memory bandwidth: 9.97 GiB/s
I'd interpret these as TwoCats and yescrypt winning over Lyra2 in terms
of average memory/second, which I think is the most important of three
metrics here. yescrypt loses to TwoCats in terms of this metric by 8.9%,
but it possibly makes up for that by using 21.6% more memory bandwidth
(some would view that as a drawback, but in this test we were specifically
trying to use as much bandwidth from one thread as we could).
BTW, chances are that going from r=8 to r=16 (already a tunable, like in
scrypt) would increase yescrypt's bandwidth usage (and average and peak
memory usage per second) some further. I choose r=8 for PHS() because
it increases reliance on fast small random lookups. r=32 and higher
might be faster or slower, since we're competing for L1 cache space with
pwxform's S-boxes (8 KiB in these tests).
> I ran the same tests on Alexander's Sandy Bridge server, which has more
> memory channels, I think. For this one-thread case, the results scaled
> almost exactly from the results above, just a bit slower. On Alexander's
> Haswell machine, Lyra2 catches up a bit, which surprised me, because I have
> the impression that the memory on this machine is slower than the memory on
> mine. Here's the raw data for Alexander's Haswell box:
I think having more than 2 memory channels is only helpful when you run
more than 1 thread. As to Ivy Bridge vs. Haswell, I think your earlier
guess about the memory types affecting results significantly might have
been wrong. They're DDR3-1600 in both cases, right? just with slightly
different timings. I think Ivy Bridge vs. Haswell itself probably makes
more of a difference, and it is quite possible that Haswell is slower on
some benchmarks.
> That's all I have for now!
Thanks!
Alexander
Powered by blists - more mailing lists