lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 22 Mar 2014 23:16:55 +0400
From: Solar Designer <>
Subject: TwoCats multiplication chain


In current TwoCats, you have:

                // Compute the multiplication chain
                for(uint32_t k = 0; k < multiplies; k++) {
                    v = (uint32_t)v * (uint64_t)oddState[k];
                    v ^= randVal;
                    randVal += v >> 32;

I think this has twice lower ASIC attack latency than what you
probably expected.  Here are some test questions: can the next
multiplication start being computed before the previous one completes?
Can this reduce total latency for two multiplications, and by how much?

In a 32x32 multiplier, the highest latency is for bit 31 of the result.
I look at these diagrams when considering what happens there:

In your code, only one of the partial products for bit 31 depends on the
previous iteration's bit 31 of v.  Thus, almost all of the partial
products can start being computed before bit 31 of the previous
iteration's v is ready.

I think "randVal += v >> 32;" limits the speedup from such overlapped
computation to a factor of 2, since each bit from 32 to 63 depends on
bit 31 of the multiplicand, and this addition with carry makes all of
them potentially propagate to bit 31 of randVal, and thus to v, but this
final propagation only happens with a delay of one loop iteration.

I think changing the multiplication chain to:

                for(uint32_t k = 0; k < multiplies; k++) {
                    v = (uint32_t)v * (uint64_t)oddState[k];
                    randVal += v >> 32;
                    v ^= randVal;

defeats this attack.  Unfortunately, this has slightly lower efficiency
for the defender (more time is spent on latencies of non-multiplication
operations, which don't cost as much latency in ASIC).

On a related note, now that you've moved to 32x32->64, which is
lossless, I think you don't need oddState[] anymore (you can use state[]
directly, without the "| 1").


Powered by blists - more mailing lists