phc-discussions - Re: [PHC] A review per day

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140911214503.GA16685@openwall.com>
Date: Fri, 12 Sep 2014 01:45:03 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] A review per day - Lyra2

On Thu, Sep 11, 2014 at 07:35:31PM +0100, Samuel Neves wrote:
> On the 32-bit version of the permutation, it is straightforward to replace vpaddd with vpmulld + vpaddd.

As discussed before, (V)PMULLD has 10 cycles latency on Haswell and 11
cycles on Silvermont/Avoton, as opposed to 5 cycles for other multiply
instructions on the same CPUs.  But I guess this may be acceptable, if
this specific approach has other advantages over possible alternatives.
On most other CPUs, (V)PMULLD is fast.

In yescrypt, I chose to avoid the need for this instruction, though,
opting to rely on 32x32->64 multiplies instead, which is (V)PMULUDQ
(but twice fewer of these fit per vector).

Alexander