lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 25 Aug 2014 12:32:58 -0400
From: Bill Cox <>
Subject: Re: [PHC] Re: Tradeoff cryptanalysis of password hashing schemes

Hash: SHA1

On 08/25/2014 04:58 AM, Dmitry Khovratovich wrote:
> As there are questions on this mailing list on how we got our
> energy & size estimates, here is a short insight.

Thanks for the data.

> A naive (no tradeoff) implementation of a password hashing scheme
> on ASIC with up to a GByte of RAM would use only a tiny portion of
> memory at every step. When the memory is not used, it still
> consumes power to sustain the state values. An energy-efficient
> adversary would use the memory that minimizes the total energy
> spent: for reading, writing, activating, etc. There exist low-power
> static RAM designs, of which the following two are interesting (we
> used [1]): [1] 850 MHz, 65nm design with retention mode leakage 0.1
> mW (10^{-4} Watt ) per MBit and active energy (read/write) 0.0001 J
> (10^-4) per GBit. [2] 333 MHz, 65nm design with retention mode
> leakage 0.00015 mW (10^{-7} Watt ) per MBit and active energy
> (read/write) 0.0004 J (10^-3.5) per GBit.
> 1 GByte of [1] occupies 12,000 mm^2, and of [2] -- 5,000 mm^2. This
> is certainly larger than the current CPU/GPU chip sizes (500-700
> mm^2 for recent Intel and nVidia designs), but comparable to
> largest CMOS sensors (40,000 mm^2 and more) and  certainly smaller
> than a single wafer. In fact, our own tradeoffs do not explicitly
> require all the memory being on a single chip, but it should have
> reasonable latency (<5 cycles).

That's 18 square inches.  No one on will sell you an ASIC with more
than 1 square inch!  Please update your numbers to reflect the fact
that you will have to go to off-chip GDDR5 DRAM for a 1GiB hash.
There is simply no way around it.  Also, Catena is designed for
on-chip cache sizes of maybe up to 30MiB.  Comparison should be done
vs an on-chip SRAM cache with zero latency for Catena, since
predictable addressing allows us to have the data exactly when we need it.

BTW, Intel uses eDRAM (embedded DRAM) in their latest Haswell
processors.  It is very tricky to integrate eDRAM, but they succeeded,
and the result is a dramatic power and area reduction.

Here's a recent GDDR5 datasheet:

This is a couple years out of date, but it's fairly close to state of
the art.  This is about the best DRAM you can buy for cracking
passwords.  Please consider using specs from real DRAM that is
commercially available, rather than something that is better than what
companies like Samsung are capable of.

The relevant pages are 128 and 129.  Supply voltage is 1.6V, and the
IDD4R and IDD4W are the supply current for reading and writing at
different speeds.  For 6 Gbps, it's 	1.89A for writing and 1.82A for
reading.  At a slower 4 Gbps, it's 1.37A for writing and 1.3A for
reading.  The power is 2.9W for reading at 6Gbps, vs 2.1W for 4Gbps.
Clearly the dynamic power loss is larger than the static power loss.

Read latency is shown on page 42.  At 6Gbps, the first data from a new
1KiB block will take 19 clock cycles (CAS latency), assuming burst
mode is enabled (probably a good idea).  It's 1.5GHz, so that's
12.6ns.  After that, data streams at 6Gbps per pin, with 32 pins
(24GiB/s total), so your 1KiB will take another 42.6ns. to transfer.

A reasonable goal would be to have several such GDDR interfaces on
your ASIC.  It is very hard to do!  However, it is doable, at least in
theory.  If you had 512 pins all running at 6Gpbs, you'd have 384GiB/s
bandwidth.  That would probably set a world record for chip bandwidth,
but it's within a factor of 2X of what has been built before.

Because recomputations increase the total read/writes per hash,you're
going to most likely *increase* DRAM power with a high TMTO.  The
lowest power attack is going to be with *no* TMTO.  You're also going
to max out that bandwidth pretty quick limiting computation speed.

If I read your table right, you do 21 read/writes to do a 1GiB hash.
I assume these are sequential operations on 1KiB blocks.  That's 2.56M
sequential external DRAM read/writes, at 55ns each, for 140ms.  You
simply can't do this in 20ms, as your slide claims.  The DRAM energy
is 2.9W for 140ms, or about 0.4J.  In your slide you had 0.034J.
You're off by 10X.  Assuming your AES power calculation is correct,
then DRAM power dominates.  Using more DRAM with a faster algorithm
(Argon does 21 sequential read/writes

In comparison, you said Lyra2 would take 80ms to read/write 6GiB.  Why
would reading/writing 6GiB to external DRAM take longer for Lyra2 than
reading/writing 21GiB to the same external DRAM for Argon?

Also, when comparing the power for an ASIC to compute a password hash,
we need to use real memory sizes supported by the algorithm.  In 1
second, TwoCats can hash about 6GiB on my Ivy Bridge machine.  It is
heavily memory bandwidth limited.  It writes memory once and reads
memory once.  A system using TwoCats would hash several times more
memory in the same time as Argon, and this should be taken into
account when considering an ASIC attack.  A 1GiB Argon password guess
likely takes an ASIC 6-ish times longer to do a 6-ish GiB TwoCats
guess, with multiplication chains turned off.  When enabled,

This power discussion seems very relevant.  Let's get the numbers
right!  Please correct any incorrect assumptions above.

Version: GnuPG v1


Powered by blists - more mailing lists