phc-discussions - Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOLP8p4iEKiBVTVPUQCnfBFSsgYDOMXZbmN9cMtibQ_+G1jW=w@mail.gmail.com>
Date: Sat, 8 Feb 2014 23:26:36 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

On Sat, Feb 8, 2014 at 10:16 PM, Samuel Neves <sneves@....uc.pt> wrote:
> On 09-02-2014 02:52, Bill Cox wrote:
>> However, this constant is 0x6C078965.  A simple way to compute it is
>> just to add the 14 places that 1's occur.  There are 5 places where
>> 1's occur together, so the same adder output can be reused in all 5
>> places, and we've got 8 more additions to do.  Worst case, that's 9
>> adders, where with Booth encoding for non-constant multiplication, it
>> would be 16.  I'm running a fun little program I just wrote to find
>> the optimal number but it may not finish before the PHC winner is
>> chosen... I'm feeling to lazy to optimize it.  It's called
>> "constmult", and I just checked it in under noelkdf.
>
> This is a well-known variant of the addition-chain problem. SPIRAL [1]
> generates the following code sequence, using 7 additions:
>
> //
> -----------------------------------------------------------------------------
> // This code was generated by Spiral Multiplier Block Generator,
> www.spiral.net
> // Copyright (c) 2006, Carnegie Mellon University
> // All rights reserved.
> // The generated code is distributed under a BSD style license
> // (see http://www.opensource.org/licenses/bsd-license.php)//
> -----------------------------------------------
> // Cost: 7 adds/subtracts 7 shifts 0 negations
> // Input:
> //   int t0
> // Outputs:
> //   int t1 = 1812433253 * t0
> // -----------------------------------------------
> t3 = shl(t0, 2);
> t2 = add(t0, t3);
> t5 = shl(t0, 5);
> t4 = add(t2, t5);
> t7 = shl(t4, 6);
> t6 = add(t4, t7);
> t9 = shl(t2, 26);
> t8 = sub(t9, t6);
> t11 = shl(t0, 4);
> t10 = sub(t11, t0);
> t13 = shl(t0, 16);
> t12 = add(t10, t13);
> t14 = shl(t12, 15);
> t1 = sub(t14, t8);
>
> [1] http://spiral.ece.cmu.edu/mcm/gen.html

Nice!  So, 7 adder/subtractors.  By the way, if you're familiar with
results from skilled hand layout of such circuits in advanced IC
processes, I'd love to hear your thoughts on Salsa20/8 latency with a
custom hand-optimized layout in the fastest processes.  I'm guessing <
1ns, but I'm not an expert here.  I'm basing that on a comparison to
my 32x32 multiplier in my Intel quad-core i7.  Basically, if I haven't
made a mistake (and I do make a lot), there are 4 add/xor levels of
logic per 2 rounds of Salsa, so Salsa20/8 is 16 levels of add + 16
levels of XOR for each of it's 16 32-bit registers.  This seems
comparable to 16 32x32 multipliers running in parallel to me.  On my
"development machine" (my son's MineCraft server) 32x32 multiplication
runs in 0.9ns.

I'll stick to two variable multiplications for now in the hash
function, and avoid multiplication by a constant.  The NoelKDF
multiplication-hardened hash function has survived a whole week :-)
Steve pretty much demolished the first version where I wasn't hashing
the final result with a crypto-strength hash, with his awesome
modulo-4 attack, but since including Steve's suggestions, the code's
outer loop has lasted 3 days!  Whoo hoo! :-p

Bill