lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 22 Mar 2014 23:16:55 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: TwoCats multiplication chain

Bill,

In current TwoCats, you have:

                // Compute the multiplication chain
                for(uint32_t k = 0; k < multiplies; k++) {
                    v = (uint32_t)v * (uint64_t)oddState[k];
                    v ^= randVal;
                    randVal += v >> 32;
                }

I think this has twice lower ASIC attack latency than what you
probably expected.  Here are some test questions: can the next
multiplication start being computed before the previous one completes?
Can this reduce total latency for two multiplications, and by how much?

In a 32x32 multiplier, the highest latency is for bit 31 of the result.
I look at these diagrams when considering what happens there:

http://www10.edacafe.com/book/ASIC/Book/Book/Book/CH02/CH02.6.php?interstitial_displayed=Yes#pgfId=167904

In your code, only one of the partial products for bit 31 depends on the
previous iteration's bit 31 of v.  Thus, almost all of the partial
products can start being computed before bit 31 of the previous
iteration's v is ready.

I think "randVal += v >> 32;" limits the speedup from such overlapped
computation to a factor of 2, since each bit from 32 to 63 depends on
bit 31 of the multiplicand, and this addition with carry makes all of
them potentially propagate to bit 31 of randVal, and thus to v, but this
final propagation only happens with a delay of one loop iteration.

I think changing the multiplication chain to:

                for(uint32_t k = 0; k < multiplies; k++) {
                    v = (uint32_t)v * (uint64_t)oddState[k];
                    randVal += v >> 32;
                    v ^= randVal;
                }

defeats this attack.  Unfortunately, this has slightly lower efficiency
for the defender (more time is spent on latencies of non-multiplication
operations, which don't cost as much latency in ASIC).

On a related note, now that you've moved to 32x32->64, which is
lossless, I think you don't need oddState[] anymore (you can use state[]
directly, without the "| 1").

Alexander

Powered by blists - more mailing lists