phc-discussions - Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <535583F5.50200@dei.uc.pt>
Date: Mon, 21 Apr 2014 21:47:49 +0100
From: Samuel Neves <sneves@....uc.pt>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

On 21-04-2014 21:35, Solar Designer wrote:
> I just found that AuthenticAMD0700F01_K16_Kabini_InstLatX86.txt (AMD
> A4-5000 APU at 1.5 GHz) lists "PMULUDQ xmm, xmm" and "VPMULUDQ xmm, xmm,
> xmm" as having 2 cycles latency.  While this improvement over 3 cycles
> (best across other CPUs) is almost certainly due to the lower clock
> rate, it's still impressive.  This same file lists scalar *MUL as 3+
> cycles latency, and 4 cycles for high 32 bits.  So the SIMD form of
> 32x32->64 is twice faster than scalar (in terms of latency) on this CPU
> for high 32 bits of result.

I was just looking at the same thing; this AMD chip is very impressive! Namely, this chip should also have

 - The fastest AES-GCM speeds (in cycles/byte) of any x86 architecture.
 - The fastest binary elliptic curve scalar multiplication (see [1]) of any x86 architecture.

The reason for this is that 64-bit carryless multiplication, PCLMULQDQ, has twice the throughput and half the latency of
Haswell.

[1] https://eprint.iacr.org/2013/131