phc-discussions - Re: [PHC] Compute time hardness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150401194901.GB15310@openwall.com>
Date: Wed, 1 Apr 2015 22:49:02 +0300
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Compute time hardness

On Wed, Apr 01, 2015 at 03:25:49PM -0300, Marcos Antonio Simplicio Junior wrote:
> > De: "Solar Designer" <solar@...nwall.com>
[...]
> > What are the comparable numbers for Lyra2, POMELO, and Catena? I
> > guess for Lyra2 and Catena it's 8 ADDs + 8 XORs per 128 bytes, right?
> 
> If the underlying sponge is Blake2b, I think that number is correct.

And this applies to Lyra2 as benchmarked e.g. by Milan so far, correct?

> With BlaMka, which replaces simple ADDs in the G function (e.g., a \gets a + b) by Latin Squares (a \gets a+b+2ab), that would be 8 MUL + 16 ADDs + 8 XORs (as usual, ignoring the shifts, and multiplication by 2, which is also a shift) per 128 bytes. 
> 
> Well, that assuming I understood the question correctly ... 

I think you did.  Thank you!

I assume you mean the case SPONGE == 1, where Lyra2 would use
ROUND_LYRA_BLAMKA, not SPONGE == 2 where it'd use HALF_ROUND_LYRA_BLAMKA?

A minor detail: a+b+2ab can be written as (a+b)+2ab, giving a latency
of MUL + just one ADD, not two ADDs (a+b is computed in parallel with
2ab).  So it'd be 8 MUL + 8 ADD + 8 XOR.  Right?

That's 8 MULs per 128 bytes, vs. current yescrypt's default of 6 MULs
per 64 bytes.  So it's 1.5 times weaker.  Does Lyra2 with BlaMka run
faster or slower than yescrypt (with pwxform settings as currently
specified) in terms of memory processed per second?  It needs to run 1.5
times faster to win this game. ;-)

BTW, you can take advantage of the (a+b)+2ab latency optimization in
your defensive code as well.  It might cost an extra register and a move
instruction, though, so it isn't universally an improvement.  It might
improve the latency on new CPUs (Ivy Bridge and Bulldozer, and newer)
and make it worse on older CPUs (Sandy Bridge and older when running
non-AVX builds).

Thanks,

Alexander