phc-discussions - Re: [PHC] Lyra2 initial review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 17 Apr 2014 14:31:55 -0400
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Lyra2 initial review

On Thu, Apr 17, 2014 at 12:18 PM, Marcos Antonio Simplicio Junior <
mjunior@...c.usp.br> wrote:

> Hi again.
>
> ------------------------------
>
> *De: *"Bill Cox" <waywardgeek@...il.com>
> *Para: *discussions@...sword-hashing.net
> *Enviadas: *Quinta-feira, 17 de Abril de 2014 10:08:15
> *Assunto: *Re: [PHC] Lyra2 initial review
>
>
> On Thu, Apr 17, 2014 at 8:46 AM, Solar Designer <solar@...nwall.com>wrote:
>
>> Bill, this sounds cool, but let's not forget that maximizing single
>> thread's memory bandwidth usage is a goal only in a subset of use cases
>> (in 1 out of 3 categories, as I tried to explain earlier), whereas in
>> others it's actually a drawback (it means that other resources are not
>> fully made use of for defense when the defender is running more than one
>> thread).
>
>
> Agreed.  This level of analysis is one reason I'm sticking to yescript as
> my favorite Script-inspired entry.  However, speed is fun!  Also, most
> entries run into problems like cache miss penalties dominating runtime at
> large memory sizes, where an attacker could find ways around that.  I would
> say that on the whole, the PHC entries are erring on the side of being too
> slow and not using enough memory, like you say, there is a point beyond
> which speed becomes a negative.
>
> With decent compute-time hardening, using multiple CPUs adds some defense,
> especially if you are able to tie up multiple caches, which of all the
> entries, only Yescript seems and TwoCats seem able to do well (thanks to
> your inputs on TwoCats).  I misused the t_cost parameter to tune cache
> bandwidth vs external memory.  I'm hoping to correct that in the tweak
> phase where if it's allowed, I'll make t_cost control outer loop
> iterations, and change my current timeCost to something like
> bandwidthControl.
>
> We are still analyzing our "p" parameter to see if we get a good result
> with multiple cores (it is one one of the tweaks we did not have enough
> time to tune in the original submission, but that is described in the
> documentation). This definitely does not mean we will be able to beat
> TwoCats ou Yescrypt in that regard, especially considering that you
> understand better hardware details than I believe we do, but it will be fun
> trying :-)
>

Sweet.  Alexander understands both CPUs and GPUs far better than me.  A lot
of the fun is in trying to keep up with him.  TwoCats wouldn't even be
competitive if it weren't for Alexander's various suggestions.


>
> When comparing against scrypt or yescrypt in terms of single thread's
>> memory bandwidth usage, I think we need to compare against reduced
>> rounds versions of scrypt and yescrypt.  This means scrypt with 2 rounds
>> of Salsa20 (just delete 3 out of 4 uses of SALSA20_2ROUNDS from
>> SALSA20_8_BASE in yescrypt-simd.c) and yescrypt native mode with 1 round
>> of pwxform (just delete 5 out of 6 uses of PWXFORM_ROUND in PWXFORM).
>>
>> A hashing scheme pretending to win solely(?) in these terms would need
>> to show that it outperforms these trivial modifications of [ye]scrypt.
>> And there's not a lot of room to outperform them, even when we do, as
>> we're getting quite close to the available bandwidth.
>>
>
> I agree.  However, I'm enjoying this metric as my "alternate win
> condition".  It's how I entertain myself.  I'm bummed that EARWORM and
> Lyra2 beat my memory bandwidth with one thread, but at least I'm writing to
> more memory per thread than the rest :-)
>
> Well, it is just as trivial to change our "rho" parameter, which controls
> the number of rounds of Blake2b, which is currently hardcoded... We are,
> however, more inclined to verify the usefulness of the "chi" parameter to
> slow things down, since it controls small random reads on the rows used in
> each loop (and, hopefully, that remained on cache). With this, we could
> reduce the memory bandwidth usage, using more of the machine's processing
> budget to provide better GPU resistance.
>
> Finally, I think I prefer the win condition "best TMTO resistance", which
> was one of the things we put more effort at, so, even if we lose, hopefully
> it won't be by too much :-)
>

With t_cost > 1, you'll certainly beat TwoCat's TMTO resistance, and with
the XORing over memory, doing 2 writes for every 1 write TwoCats does, you
likely have much better TMTO resistance with t_cost == 1.

Alexander's time cost parameter is used in a way that seems to increase
memory writes in a manner similar to Lyra2, if I'm not mistaken.  If I
understand things right, there's some ratio of Lyra2 t_cost to Yescript
t_cost where both have roughly equal TMTO resistance.  I opted instead to
have a more complex memory access pattern and to get whatever TMTO
resistance I can in 1 pass.  With the dual-writes in each hashing step, you
already have double the write bandwidth, and if you do multiple overwrite
passes (t_cost > 1, right?) your TMTO resistance goes through the roof.  I
think TwoCat's has what I'll call "good" TMTO resistance, where an attacker
gains very little in terms of time*memory cost (using half the memory
requires around twice the serial computation).  Below that, and his
time*memory cost increases.  This is far from the punishment an attacker
would have if, for example, he tried using 1/2 the memory with t_cost > 1
in Lyra2, if I understand correctly.

Bill

Content of type "text/html" skipped