phc-discussions - Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140213173158.GA6622@openwall.com>
Date: Thu, 13 Feb 2014 21:31:58 +0400
From: Solar Designer <solar@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] multiply-hardening (Re: NoelKDF ready for submission)

On Thu, Feb 13, 2014 at 12:11:43PM -0500, Bill Cox wrote:
> Awesome.  I'll check out that paper.  I'm currently getting 3 cycle
> latency for 32x32->32 plus 1 cycle for the add on Ivy Bridge.  It's
> the other stuff, the OR, ADD, and memory I/O that seems to increase
> the SSE 4.1 latency.  The multiply is 5 cycles, I think.

Yes, you'd have something like 7 cycles: 5 cycles for the SIMD multiply,
1 for ADD, 1 for OR, and the memory write may be done in parallel with
the next loop iteration's start of computation, as long as the loop is
unrolled (did you use -funroll-loops?)

> I don't know if it's worth it to worry about an attacker's die area,
> except for RAM if we force him to use cache.

With very low memory settings, die area occupied by multipliers may be
comparable to or higher than that occupied by memory.  I think you're
simply not considering settings this low - but I think we should.

Also, the number of multipliers available to defender may increase a
lot, and it'd be nice if we support scaling up in that respect.

Alexander