phc-discussions - Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140214111818.GA9159@openwall.com>
Date: Fri, 14 Feb 2014 15:18:18 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

On Thu, Feb 13, 2014 at 11:00:11PM -0500, Bill Cox wrote:
>         for(j = 0; j < blocklen; j++) {
>             uint32_t *from = mem + (value & mask);
>             //uint32_t *from = mem + j;
>             value = (value * (*prev++ | 3)) + *from;
>             *to++ = value;
>         }
[...]
> I can't explain why using value to compute the next address is taking
> longer.  It doesn't make sense to me.  Do you see any problems in my
> test code?

I took a look at code that gcc generated for me from your test program.
The problem is that on 2-op archs such as x86, "value & mask" either
involves an extra MOV, which gcc is reluctant to produce in this case,
or it has to be done after the IMUL.  gcc chooses to do the latter, which
adds latency (the read of *from has to be initiated after the IMUL
instruction is issued, albeit without waiting for the IMUL to complete).

I think we could optimize this better by hand, but as I wrote in another
message we need the random lookups from "prev" (not from "from") anyway.
So we'd need to benchmark and optimize the latter.

Alexander