| lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
|
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAOLP8p4iEKiBVTVPUQCnfBFSsgYDOMXZbmN9cMtibQ_+G1jW=w@mail.gmail.com> Date: Sat, 8 Feb 2014 23:26:36 -0500 From: Bill Cox <waywardgeek@...il.com> To: discussions@...sword-hashing.net Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission) On Sat, Feb 8, 2014 at 10:16 PM, Samuel Neves <sneves@....uc.pt> wrote: > On 09-02-2014 02:52, Bill Cox wrote: >> However, this constant is 0x6C078965. A simple way to compute it is >> just to add the 14 places that 1's occur. There are 5 places where >> 1's occur together, so the same adder output can be reused in all 5 >> places, and we've got 8 more additions to do. Worst case, that's 9 >> adders, where with Booth encoding for non-constant multiplication, it >> would be 16. I'm running a fun little program I just wrote to find >> the optimal number but it may not finish before the PHC winner is >> chosen... I'm feeling to lazy to optimize it. It's called >> "constmult", and I just checked it in under noelkdf. > > This is a well-known variant of the addition-chain problem. SPIRAL [1] > generates the following code sequence, using 7 additions: > > // > ----------------------------------------------------------------------------- > // This code was generated by Spiral Multiplier Block Generator, > www.spiral.net > // Copyright (c) 2006, Carnegie Mellon University > // All rights reserved. > // The generated code is distributed under a BSD style license > // (see http://www.opensource.org/licenses/bsd-license.php)// > ----------------------------------------------- > // Cost: 7 adds/subtracts 7 shifts 0 negations > // Input: > // int t0 > // Outputs: > // int t1 = 1812433253 * t0 > // ----------------------------------------------- > t3 = shl(t0, 2); > t2 = add(t0, t3); > t5 = shl(t0, 5); > t4 = add(t2, t5); > t7 = shl(t4, 6); > t6 = add(t4, t7); > t9 = shl(t2, 26); > t8 = sub(t9, t6); > t11 = shl(t0, 4); > t10 = sub(t11, t0); > t13 = shl(t0, 16); > t12 = add(t10, t13); > t14 = shl(t12, 15); > t1 = sub(t14, t8); > > [1] http://spiral.ece.cmu.edu/mcm/gen.html Nice! So, 7 adder/subtractors. By the way, if you're familiar with results from skilled hand layout of such circuits in advanced IC processes, I'd love to hear your thoughts on Salsa20/8 latency with a custom hand-optimized layout in the fastest processes. I'm guessing < 1ns, but I'm not an expert here. I'm basing that on a comparison to my 32x32 multiplier in my Intel quad-core i7. Basically, if I haven't made a mistake (and I do make a lot), there are 4 add/xor levels of logic per 2 rounds of Salsa, so Salsa20/8 is 16 levels of add + 16 levels of XOR for each of it's 16 32-bit registers. This seems comparable to 16 32x32 multipliers running in parallel to me. On my "development machine" (my son's MineCraft server) 32x32 multiplication runs in 0.9ns. I'll stick to two variable multiplications for now in the hash function, and avoid multiplication by a constant. The NoelKDF multiplication-hardened hash function has survived a whole week :-) Steve pretty much demolished the first version where I wasn't hashing the final result with a crypto-strength hash, with his awesome modulo-4 attack, but since including Steve's suggestions, the code's outer loop has lasted 3 days! Whoo hoo! :-p Bill
Powered by blists - more mailing lists