lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 8 Feb 2014 21:52:14 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

<solar@...nwall.com> wrote:
> On Fri, Feb 07, 2014 at 05:32:45PM -0500, Bill Cox wrote:
>> On Fri, Feb 7, 2014 at 1:35 PM, Steve Thomas <steve@...tu.com> wrote:
>> > value = ROTATE_LEFT(value*(mem[prevAddr + i] | 3) + mem[fromAddr + i], 1);
>>
>> It's still a little creepy having that modulo 4 math eliminating the
>> multiply.  Gary sent a quick response to my initial NoelKDF hash
>> function, where he thought the OR was an AND, and his complaint that
>> an attacker could skip the multiply wasn't as far off as I'd thought.
>> If anyone can think of a reason to worry about it, I'll put in the
>> rotate, or something like it.  I do want to keep the number of
>> non-multiply operations to a minimum.
>
> As an option, you might want to reuse the operations and the constant
> used in Mersenne twister's initialization:
>
> #define NEXT_STATE(x, i) \
>         (x) = 1812433253U * ((x) ^ ((x) >> 30)) + (i);
>
> where "x" is the current state (32-bit) and "i" is the iteration number.
>
> If there's a shortcut to evaluating multiple iterations of this, it
> would also be useful in mt_rand() seed crackers, and so far I'm not
> aware of any implementing any such shortcut (this, of course, is no
> proof that no such shortcut exists - yet it is something).
>
> A drawback is that a multiplier by a (good) constant probably takes
> twice less die area in ASIC than an arbitrary 32x32 multiplier does - do
> I guess correctly?  I guess you're the best person in here to confirm
> this. :-)  Does such specialization of a multiplier reduce its latency?

Yes, the circuit is a lot smaller, but worse, it's faster.  I hope I'm
not the best guy here to evaluate this because I'm no digital IC
circuit design guru.  I'm one of those guys who's good at just about
everything (password hashing anyone?  how about speech acceleration
algorithms?) but not the best at much.  Guys who do hand-designed
multipliers for a living are the right guys.

However, this constant is 0x6C078965.  A simple way to compute it is
just to add the 14 places that 1's occur.  There are 5 places where
1's occur together, so the same adder output can be reused in all 5
places, and we've got 8 more additions to do.  Worst case, that's 9
adders, where with Booth encoding for non-constant multiplication, it
would be 16.  I'm running a fun little program I just wrote to find
the optimal number but it may not finish before the PHC winner is
chosen... I'm feeling to lazy to optimize it.  It's called
"constmult", and I just checked it in under noelkdf.

On Sat, Feb 8, 2014 at 3:31 AM, Solar Designer <solar@...nwall.com> wrote:
> Bill,
>
> On Sat, Feb 08, 2014 at 12:00:30PM +0400, Solar Designer wrote:
>> As an option, you might want to reuse the operations and the constant
>> used in Mersenne twister's initialization:
>>
>> #define NEXT_STATE(x, i) \
>>       (x) = 1812433253U * ((x) ^ ((x) >> 30)) + (i);
>>
>> where "x" is the current state (32-bit) and "i" is the iteration number.
>
> As a less conservative option, you may keep the non-linear operations
> from MT, but replace the constant with your "mem[] | 3" or similar.
>
> If you need to reference two memory locations on each iteration, like
> you currently do, then maybe take this construction from init_by_array()
> from mt19937ar.c:
>
>         mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1664525UL))
>           + init_key[j] + j; /* non linear */
>
> and modify it as follows:
>
>         value = ((value ^ (value >> 30)) * (mem[prevAddr + i] | 3))
>           + mem[fromAddr + i] + i;
>
> (totally untested).
>
> Alexander

I like this last one.  This should get rid of the modulo 4
simplifications.  It adds another operation in the inner loop, and
it's the worst kind: the right-shift requires nothing but wires in an
ASIC.  Still, this is what I'll try next if needed.

Thanks again, Alexander.

Bill

Powered by blists - more mailing lists