phc-discussions - Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140214105946.GA9100@openwall.com>
Date: Fri, 14 Feb 2014 14:59:46 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

Bill,

On Fri, Feb 14, 2014 at 03:51:02AM +0400, Solar Designer wrote:
> On Thu, Feb 13, 2014 at 05:36:26PM -0500, Bill Cox wrote:
> > I just tested the impact of unpredictable lookups on NoelKDF's hash
> > function.  Instead of hashing:
> > 
> >     value = value*(*prev++ | 3) + *from++;
> >     *to++ = value;
> > 
> > I tried:
> > 
> >     value = value*(*prev++ | 3) + *(from + (value & mask));
> >     *to++ = value;
> > 
> > This makes the next "from" address unpredictable and dependent on the
> > next value of value (I really need to rename this).  It slowed down
> > single-threaded hashing of 2GB from 0.76 seconds to 2 seconds.  If
> > instead of unpredictable, all I need is pseudo-randomized, I got about
> > 0.95 seconds when I replaced *from++ with *(from + (*prev & mask)).
> > 
> > It looks like there is a penalty to pay for unpredictable lookups.  Is
> > (*prev & mask) enough to cause problems for GPUs, or since *prev is
> > calculated many cycles earlier, do I really need (value & mask)?
> 
> In order to have any assurance of being no weaker than bcrypt, you do
> need (value & mask).  I am surprised that you're seeing this much of a
> performance impact given that the random lookup and the multiplication
> can proceed in parallel (right?)
[...]
> I am also surprised.  Are you sure your "mask" is small enough that this
> fits in L1 cache?  Are you sure the data was already in L1 cache?

Actually, I think your "from" is a random page, likely not in L1 cache
by the time you reach this code.  I think you wanted to replace *prev++
with *(prev + (value & mask)), and leave reads from "from" sequential
(since they usually go from L2 or worse).

This explains the worse than 2x slowdown you've observed.  I'd expect
smaller slowdown with the change I suggested above.

Alexander