lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 30 Jan 2014 04:18:37 +0000
From: Samuel Neves <sneves@....uc.pt>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Combining sliding window and bit-reversal

On 30-01-2014 03:34, Solar Designer wrote:
>
>> By definition only the most significant bit of each byte is looked at in
>> PSHUFB, so it's OK to share the same selection mask. I've additionally
>> used PSHUFB to compute the bit reversal of 4-bit values, which should be
>> somewhat faster. Here it is:
> Thanks!  Compared to yours, my SSSE3 code looks naive.
>
> This is in the public domain, correct?  (My bitrev.c is.)

Correct. I've placed a cleaned up version at
https://gist.github.com/sneves/8702353 with disclaimers and attributions.

>
>>     t3 = _mm_srli_epi32(_mm_andnot_si128(c0f, t1), 4); // x >> 4
> I guess the reason why you use ANDN before the shift rather than AND
> after it is to better accommodate the shift's higher latency, correct?

The reason was just to avoid using a new constant 0xF0. In hindsight
your suggestion also works.

The shift and the ANDN have the same latency (1 cycle); but whereas the
ANDN can be dispatched to 3 different execution ports, shift can only be
dispatched to one. I doubt it makes any difference here (that is, the
latency is 5 Intel * Bridge cycles either way), but enabling the 2 ANDNs
to run concurrently could be an advantage in some cases.


Powered by blists - more mailing lists