lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 11 Jan 2014 18:45:56 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] escrypt memory access speed (Re: [PHC] Reworked KDF
 available on github for feedback: NOELKDF)

On Sat, Jan 11, 2014 at 5:41 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Sat, Jan 11, 2014 at 1:42 PM, Bill Cox >> Imagine that all the lines of code you've written in C for escrypt,
>> other than for memory access, will cost an attacker maybe $0.02 per
>> core, and that Salsa20/20 will run in maybe 2 clocks instead of 1 for
>> Salsa20/1.  This is roughly the case, within maybe 2X in cost and
>> clock cycles.  It's pointless trying to make an ASIC attacker's
>> arithmetic cost significant in either time or money, so we have to
>> focus on memory size and bandwidth.  For defense against ASIC attacks,
>> your Salsa20/1 benchmark is the best.  50GB/s!  Nice!
>
> Is this entirely true?  That is, I always thought that the fancy ALUs
> on modern CPUs were big and expensive, and that they accounted for
> non-negligible fractions of the cost.  If so, that would mean that a
> good password hash should try to max them out.  There's also cache
> size and bandwidth, and I assume that sticking really fast cache on an
> ASIC is just as expensive as the really fast cache that already exists
> on CPU dies.

Yes, it's really true.  Intel CPUs have an unbelievable amount of
logic on them, but I've always felt half or more of it is for
backwards compatibility.  Check out this NVIDIA ARM based ASIC die
picture:

http://www.bdti.com/InsideDSP/2011/10/20/NvidiaQualcomm

If you can find the ALUs, your eyes are better than mine!  The die is
80-90% RAM (minus pad ring), and that's doing a lot of processing.
Can you find the multipliers?  They should be visible, but I didn't
see them.

> (I could be wrong here.  I've fiddled with high-end Virtex parts, but
> I have no real concept of what ASIC logic costs.)

You are right that cache on an ASIC is just as expensive (or more)
than on a CPU.  Hammering cache in a KDF is an excellent defense
against ASIC attacks.  However, you can't count on having a very low
cost*time multiplier when an ASIC does the whole attack on-chip
(nothing close to the 4-5X per ASIC vs Alexander's 50GB/s benchmark).
It can split that cache into 100 banks and the bandwidth can go
through the roof.  Think Terabytes per second.  In that case, your
best defense is that sequential inner loop, but like I said,
Salsa20/20 is likely only 2-ish clock cycles (maybe 1 or 4?), so
making the inner loop complex doesn't buy you much.  You'd still have
trouble finding that ALU vs the cache it's sitting next to.

On the positive side, the latest generation of custom ASICs on the
most advanced processes are getting too expensive for average Joe
crackers, while our CPUs keep chugging along according to Moore's Law.
 Government sponsored crackers are another story.

Bill

Powered by blists - more mailing lists