lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 13 Feb 2014 23:37:45 +0400
From: Solar Designer <>
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

On Sun, Feb 09, 2014 at 05:47:29AM +0000, Samuel Neves wrote:
> I managed to shave one instruction (cycle) off in two different ways,
> one requiring SSE4.1, the other only SSE2. Both run at same speed.


> One note: since we're doing two PMULUDQ, and they cannot be dispatched
> in the same cycle, the lower bound for the latency of any accurate
> emulation is 6 cycles.

Good point.  However, when we have a sequence of multiple emulated
_mm_mullo_epi32()'s, this extra cycle may be incurred only once per
sequence rather than once per _mm_mullo_epi32().

It's similar to how the scalar widening MUL has (on some CPUs) 1 cycle
higher latency for edx (or rdx) than for eax (or rax).  In the table I
posted, I included the higher latency, but we may have in mind that in
some instruction sequences the lower latency will sometimes apply.


Powered by blists - more mailing lists