[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <535583F5.50200@dei.uc.pt>
Date: Mon, 21 Apr 2014 21:47:49 +0100
From: Samuel Neves <sneves@....uc.pt>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)
On 21-04-2014 21:35, Solar Designer wrote:
> I just found that AuthenticAMD0700F01_K16_Kabini_InstLatX86.txt (AMD
> A4-5000 APU at 1.5 GHz) lists "PMULUDQ xmm, xmm" and "VPMULUDQ xmm, xmm,
> xmm" as having 2 cycles latency. While this improvement over 3 cycles
> (best across other CPUs) is almost certainly due to the lower clock
> rate, it's still impressive. This same file lists scalar *MUL as 3+
> cycles latency, and 4 cycles for high 32 bits. So the SIMD form of
> 32x32->64 is twice faster than scalar (in terms of latency) on this CPU
> for high 32 bits of result.
I was just looking at the same thing; this AMD chip is very impressive! Namely, this chip should also have
- The fastest AES-GCM speeds (in cycles/byte) of any x86 architecture.
- The fastest binary elliptic curve scalar multiplication (see [1]) of any x86 architecture.
The reason for this is that 64-bit carryless multiplication, PCLMULQDQ, has twice the throughput and half the latency of
Haswell.
[1] https://eprint.iacr.org/2013/131
Powered by blists - more mailing lists