phc-discussions - Re: [PHC] Lyra2 initial review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOLP8p6dvpx4E9+YmQVNZeY-6O3Yb9BdSjcTWqWjtQnAH6o7oQ@mail.gmail.com>
Date: Thu, 17 Apr 2014 09:08:15 -0400
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Lyra2 initial review

On Thu, Apr 17, 2014 at 8:46 AM, Solar Designer <solar@...nwall.com> wrote:

> Bill, this sounds cool, but let's not forget that maximizing single
> thread's memory bandwidth usage is a goal only in a subset of use cases
> (in 1 out of 3 categories, as I tried to explain earlier), whereas in
> others it's actually a drawback (it means that other resources are not
> fully made use of for defense when the defender is running more than one
> thread).

Agreed.  This level of analysis is one reason I'm sticking to yescript as
my favorite Script-inspired entry.  However, speed is fun!  Also, most
entries run into problems like cache miss penalties dominating runtime at
large memory sizes, where an attacker could find ways around that.  I would
say that on the whole, the PHC entries are erring on the side of being too
slow and not using enough memory, like you say, there is a point beyond
which speed becomes a negative.

With decent compute-time hardening, using multiple CPUs adds some defense,
especially if you are able to tie up multiple caches, which of all the
entries, only Yescript seems and TwoCats seem able to do well (thanks to
your inputs on TwoCats).  I misused the t_cost parameter to tune cache
bandwidth vs external memory.  I'm hoping to correct that in the tweak
phase where if it's allowed, I'll make t_cost control outer loop
iterations, and change my current timeCost to something like
bandwidthControl.

When comparing against scrypt or yescrypt in terms of single thread's
> memory bandwidth usage, I think we need to compare against reduced
> rounds versions of scrypt and yescrypt.  This means scrypt with 2 rounds
> of Salsa20 (just delete 3 out of 4 uses of SALSA20_2ROUNDS from
> SALSA20_8_BASE in yescrypt-simd.c) and yescrypt native mode with 1 round
> of pwxform (just delete 5 out of 6 uses of PWXFORM_ROUND in PWXFORM).
>
> A hashing scheme pretending to win solely(?) in these terms would need
> to show that it outperforms these trivial modifications of [ye]scrypt.
> And there's not a lot of room to outperform them, even when we do, as
> we're getting quite close to the available bandwidth.
>

I agree.  However, I'm enjoying this metric as my "alternate win
condition".  It's how I entertain myself.  I'm bummed that EARWORM and
Lyra2 beat my memory bandwidth with one thread, but at least I'm writing to
more memory per thread than the rest :-)

> I would actually be (somewhat) interested if you do this sort of
> comparison for Lyra2 vs. scrypt with Salsa20/2 vs. yescrypt with 1 round
> of pwxform (and, if you like, also with Salsa20/2 instead of /8 for the
> final, real crypto sub-block per block).  I wrote "somewhat" because I
> expect all of these to be similar, since we're getting close to full
> memory bandwidth available from one thread in hardware. ;-)
>
> Thank you for helping review PHC candidates!
>
> Alexander
>

Done!  I was looking for a way to procrastinate just a bit longer at work.
 I'll have data probably before lunchtime, at least for my Ivy Bridge
machine.  Done properly, I should test on your high memory bandwidth Sandy
Bridge machine and also your AVX2 Haswell box.  That may take me longer.

Bill

Content of type "text/html" skipped