[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOLP8p7KjMGVWn=pgut0p6Pivf+w7jc2336apr+O7DjOt3ox=A@mail.gmail.com>
Date: Thu, 17 Apr 2014 11:08:25 -0400
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Lyra2 initial review
Here' s some raw data comparing best-case single-thread scenarios for
hashing speed of Lyra2, Yescript, and TwoCats. All three are SSE2
optimized versions compiled with:
gcc -Wall -march=native -std=gnu99 -pthread -lcrypto -lm -O3 <files>
I modified TwoCats to use 1 thread in it's PHS function (which it should be
doing by default), and reduced multiplications to 1 (meaning 2 multiples
per 16 bytes hashed) so it would not dominate runtime.
I modified Yescript as Alexander suggested, commenting out 3 of 4 calls in
the Salsa20/8 hashing, turning it into Salsa20/2, and commenting out 5 of 6
calls to PWXFORM_ROUND. I also added printf statements to all 3 just to be
sure we're allocating the same amount of memory. Raw result data below:
*********** Lyra2 - no modifications, hashing 2GiB of memory
PHC> !!
time ./phs-lyra 1 349525
Allocating 2147481600 bytes
2e bf 7b 0b 24 b3 c8 54
68 3b 50 00 90 f4 88 c8
2b 31 cb 26 72 74 62 8b
79 31 0d c3 0e f0 f4 a1 32 (octets)
real 0m1.320s
user 0m1.200s
sys 0m0.110s
*********** Yescript - 5 of 6 calls to PWXFORM_ROUND commented out, and 3
of 4 calls to SALSA20_2ROUNDS
PHC> !!
time ./phs-yescrypt 0 18
Allocating 2147494912 bytes
1c 92 02 2b 21 8d 7a 9d
34 c4 77 26 4b 6a d3 40
b7 96 a5 5f 4b 1f 5a ae
ad 33 81 ef 7a 90 13 89 32 (octets)
real 0m0.709s
user 0m0.560s
sys 0m0.140s
*********** TwoCats - multiples was set to 1, and #threads set to 1
PHC> !time
time ./phs-twocats 0 21
Allocating 2147483648 bytes
58 75 b8 55 94 98 35 a9
3a 88 e3 7b c8 6e af 7d
ab 37 fc 2c 4c b0 6d 40
2e f5 f0 42 68 e2 33 ba 32 (octets)
real 0m0.431s
user 0m0.330s
sys 0m0.080s
This gives the time each algorithm requires to allocate and hash 2GiB of
memory. With Alexander's suggested tweaks, Yescript is faster than Lyra2.
I want this as a parameter to Yescript! Single thread may not be
everything, but it is an important case.
Tabulating what I think this all means:
Lyra2:
Peak memory/second: 1.63 GiB/s
Average memory/second: 1.22 GiB/s
Memory bandwidth: 11.4 GiB/s (highly bandwidth limited)
Yescript:
Peak memory/second: 3.03 GiB/s
Average memory/second: 2.27 GiB/s
Memory bandwidth: 6.06 GiB/s (does it do 2 r/w to DRAM per location, or 3?)
TwoCats:
Peak memory/second: 4.98 GiB/s
Average memory/second: 2.49 GiB/s
Memory bandwidth: 9.97 GiB/s
Lyra2 slammed hard into my memory bandwidth limit, and so it has the lowest
hashing rate of the three. It does 7 memory accesses per memory location
(4 writes, 3 reads), while TwoCats and Yescript do on average 1 read and 1
write (at least if Yescript is behaving like Script in this mode). The
average memory*time for Yescript almost beats TwoCats. The reason TwoCats
is ahead on peak memory*time is that TwoCats second loop continues to fill
more memory, while Lyra2 and Yescript do not.
I ran the same tests on Alexander's Sandy Bridge server, which has more
memory channels, I think. For this one-thread case, the results scaled
almost exactly from the results above, just a bit slower. On Alexander's
Haswell machine, Lyra2 catches up a bit, which surprised me, because I have
the impression that the memory on this machine is slower than the memory on
mine. Here's the raw data for Alexander's Haswell box:
*********** Lyra2 - no modifications, hashing 2GiB of memory
time ./phs-lyra 1 349525
Allocating 2147481600 bytes
2e bf 7b 0b 24 b3 c8 54
68 3b 50 00 90 f4 88 c8
2b 31 cb 26 72 74 62 8b
79 31 0d c3 0e f0 f4 a1 32 (octets)
real 0m1.578s
user 0m1.172s
sys 0m0.364s
*********** Yescript - 5 of 6 calls to PWXFORM_ROUND commented out, and 3
of 4 calls to SALSA20_2ROUNDS
PHC> !!
time ./phs-yescrypt 0 18
Allocating 2147494912 bytes
1c 92 02 2b 21 8d 7a 9d
34 c4 77 26 4b 6a d3 40
b7 96 a5 5f 4b 1f 5a ae
ad 33 81 ef 7a 90 13 89 32 (octets)
real 0m1.142s
user 0m0.744s
sys 0m0.352s
*********** TwoCats - multiples was set to 1, and #threads set to 1
time ./phs-twocats 0 21
Allocating 2147483648 bytes
58 75 b8 55 94 98 35 a9
3a 88 e3 7b c8 6e af 7d
ab 37 fc 2c 4c b0 6d 40
2e f5 f0 42 68 e2 33 ba 32 (octets)
real 0m0.717s
user 0m0.308s
sys 0m0.376s
That's all I have for now!
Bill
Content of type "text/html" skipped
Powered by blists - more mailing lists