lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 21 Apr 2014 21:47:49 +0100
From: Samuel Neves <sneves@....uc.pt>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

On 21-04-2014 21:35, Solar Designer wrote:
> I just found that AuthenticAMD0700F01_K16_Kabini_InstLatX86.txt (AMD
> A4-5000 APU at 1.5 GHz) lists "PMULUDQ xmm, xmm" and "VPMULUDQ xmm, xmm,
> xmm" as having 2 cycles latency.  While this improvement over 3 cycles
> (best across other CPUs) is almost certainly due to the lower clock
> rate, it's still impressive.  This same file lists scalar *MUL as 3+
> cycles latency, and 4 cycles for high 32 bits.  So the SIMD form of
> 32x32->64 is twice faster than scalar (in terms of latency) on this CPU
> for high 32 bits of result.

I was just looking at the same thing; this AMD chip is very impressive! Namely, this chip should also have

 - The fastest AES-GCM speeds (in cycles/byte) of any x86 architecture.
 - The fastest binary elliptic curve scalar multiplication (see [1]) of any x86 architecture.

The reason for this is that 64-bit carryless multiplication, PCLMULQDQ, has twice the throughput and half the latency of
Haswell.

[1] https://eprint.iacr.org/2013/131

Powered by blists - more mailing lists