lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 8 Feb 2014 23:26:36 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

On Sat, Feb 8, 2014 at 10:16 PM, Samuel Neves <sneves@....uc.pt> wrote:
> On 09-02-2014 02:52, Bill Cox wrote:
>> However, this constant is 0x6C078965.  A simple way to compute it is
>> just to add the 14 places that 1's occur.  There are 5 places where
>> 1's occur together, so the same adder output can be reused in all 5
>> places, and we've got 8 more additions to do.  Worst case, that's 9
>> adders, where with Booth encoding for non-constant multiplication, it
>> would be 16.  I'm running a fun little program I just wrote to find
>> the optimal number but it may not finish before the PHC winner is
>> chosen... I'm feeling to lazy to optimize it.  It's called
>> "constmult", and I just checked it in under noelkdf.
>
> This is a well-known variant of the addition-chain problem. SPIRAL [1]
> generates the following code sequence, using 7 additions:
>
> //
> -----------------------------------------------------------------------------
> // This code was generated by Spiral Multiplier Block Generator,
> www.spiral.net
> // Copyright (c) 2006, Carnegie Mellon University
> // All rights reserved.
> // The generated code is distributed under a BSD style license
> // (see http://www.opensource.org/licenses/bsd-license.php)//
> -----------------------------------------------
> // Cost: 7 adds/subtracts 7 shifts 0 negations
> // Input:
> //   int t0
> // Outputs:
> //   int t1 = 1812433253 * t0
> // -----------------------------------------------
> t3 = shl(t0, 2);
> t2 = add(t0, t3);
> t5 = shl(t0, 5);
> t4 = add(t2, t5);
> t7 = shl(t4, 6);
> t6 = add(t4, t7);
> t9 = shl(t2, 26);
> t8 = sub(t9, t6);
> t11 = shl(t0, 4);
> t10 = sub(t11, t0);
> t13 = shl(t0, 16);
> t12 = add(t10, t13);
> t14 = shl(t12, 15);
> t1 = sub(t14, t8);
>
> [1] http://spiral.ece.cmu.edu/mcm/gen.html

Nice!  So, 7 adder/subtractors.  By the way, if you're familiar with
results from skilled hand layout of such circuits in advanced IC
processes, I'd love to hear your thoughts on Salsa20/8 latency with a
custom hand-optimized layout in the fastest processes.  I'm guessing <
1ns, but I'm not an expert here.  I'm basing that on a comparison to
my 32x32 multiplier in my Intel quad-core i7.  Basically, if I haven't
made a mistake (and I do make a lot), there are 4 add/xor levels of
logic per 2 rounds of Salsa, so Salsa20/8 is 16 levels of add + 16
levels of XOR for each of it's 16 32-bit registers.  This seems
comparable to 16 32x32 multipliers running in parallel to me.  On my
"development machine" (my son's MineCraft server) 32x32 multiplication
runs in 0.9ns.

I'll stick to two variable multiplications for now in the hash
function, and avoid multiplication by a constant.  The NoelKDF
multiplication-hardened hash function has survived a whole week :-)
Steve pretty much demolished the first version where I wasn't hashing
the final result with a crypto-strength hash, with his awesome
modulo-4 attack, but since including Steve's suggestions, the code's
outer loop has lasted 3 days!  Whoo hoo! :-p

Bill

Powered by blists - more mailing lists