phc-discussions - Re: [PHC] Re: Tradeoff cryptanalysis of password hashing schemes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53FB653A.6090005@ciphershed.org>
Date: Mon, 25 Aug 2014 12:32:58 -0400
From: Bill Cox <waywardgeek@...hershed.org>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Re: Tradeoff cryptanalysis of password hashing schemes

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/25/2014 04:58 AM, Dmitry Khovratovich wrote:
> As there are questions on this mailing list on how we got our
> energy & size estimates, here is a short insight.

Thanks for the data.

> A naive (no tradeoff) implementation of a password hashing scheme
> on ASIC with up to a GByte of RAM would use only a tiny portion of
> memory at every step. When the memory is not used, it still
> consumes power to sustain the state values. An energy-efficient
> adversary would use the memory that minimizes the total energy
> spent: for reading, writing, activating, etc. There exist low-power
> static RAM designs, of which the following two are interesting (we
> used [1]): [1] 850 MHz, 65nm design with retention mode leakage 0.1
> mW (10^{-4} Watt ) per MBit and active energy (read/write) 0.0001 J
> (10^-4) per GBit. [2] 333 MHz, 65nm design with retention mode
> leakage 0.00015 mW (10^{-7} Watt ) per MBit and active energy
> (read/write) 0.0004 J (10^-3.5) per GBit.
> 
> 1 GByte of [1] occupies 12,000 mm^2, and of [2] -- 5,000 mm^2. This
> is certainly larger than the current CPU/GPU chip sizes (500-700
> mm^2 for recent Intel and nVidia designs), but comparable to
> largest CMOS sensors (40,000 mm^2 and more) and  certainly smaller
> than a single wafer. In fact, our own tradeoffs do not explicitly
> require all the memory being on a single chip, but it should have
> reasonable latency (<5 cycles).

That's 18 square inches.  No one on will sell you an ASIC with more
than 1 square inch!  Please update your numbers to reflect the fact
that you will have to go to off-chip GDDR5 DRAM for a 1GiB hash.
There is simply no way around it.  Also, Catena is designed for
on-chip cache sizes of maybe up to 30MiB.  Comparison should be done
vs an on-chip SRAM cache with zero latency for Catena, since
predictable addressing allows us to have the data exactly when we need it.

BTW, Intel uses eDRAM (embedded DRAM) in their latest Haswell
processors.  It is very tricky to integrate eDRAM, but they succeeded,
and the result is a dramatic power and area reduction.

Here's a recent GDDR5 datasheet:

http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFR%28Rev1.0%29.pdf

This is a couple years out of date, but it's fairly close to state of
the art.  This is about the best DRAM you can buy for cracking
passwords.  Please consider using specs from real DRAM that is
commercially available, rather than something that is better than what
companies like Samsung are capable of.

The relevant pages are 128 and 129.  Supply voltage is 1.6V, and the
IDD4R and IDD4W are the supply current for reading and writing at
different speeds.  For 6 Gbps, it's 	1.89A for writing and 1.82A for
reading.  At a slower 4 Gbps, it's 1.37A for writing and 1.3A for
reading.  The power is 2.9W for reading at 6Gbps, vs 2.1W for 4Gbps.
Clearly the dynamic power loss is larger than the static power loss.

Read latency is shown on page 42.  At 6Gbps, the first data from a new
1KiB block will take 19 clock cycles (CAS latency), assuming burst
mode is enabled (probably a good idea).  It's 1.5GHz, so that's
12.6ns.  After that, data streams at 6Gbps per pin, with 32 pins
(24GiB/s total), so your 1KiB will take another 42.6ns. to transfer.

A reasonable goal would be to have several such GDDR interfaces on
your ASIC.  It is very hard to do!  However, it is doable, at least in
theory.  If you had 512 pins all running at 6Gpbs, you'd have 384GiB/s
bandwidth.  That would probably set a world record for chip bandwidth,
but it's within a factor of 2X of what has been built before.

Because recomputations increase the total read/writes per hash,you're
going to most likely *increase* DRAM power with a high TMTO.  The
lowest power attack is going to be with *no* TMTO.  You're also going
to max out that bandwidth pretty quick limiting computation speed.

If I read your table right, you do 21 read/writes to do a 1GiB hash.
I assume these are sequential operations on 1KiB blocks.  That's 2.56M
sequential external DRAM read/writes, at 55ns each, for 140ms.  You
simply can't do this in 20ms, as your slide claims.  The DRAM energy
is 2.9W for 140ms, or about 0.4J.  In your slide you had 0.034J.
You're off by 10X.  Assuming your AES power calculation is correct,
then DRAM power dominates.  Using more DRAM with a faster algorithm
(Argon does 21 sequential read/writes

In comparison, you said Lyra2 would take 80ms to read/write 6GiB.  Why
would reading/writing 6GiB to external DRAM take longer for Lyra2 than
reading/writing 21GiB to the same external DRAM for Argon?

Also, when comparing the power for an ASIC to compute a password hash,
we need to use real memory sizes supported by the algorithm.  In 1
second, TwoCats can hash about 6GiB on my Ivy Bridge machine.  It is
heavily memory bandwidth limited.  It writes memory once and reads
memory once.  A system using TwoCats would hash several times more
memory in the same time as Argon, and this should be taken into
account when considering an ASIC attack.  A 1GiB Argon password guess
likely takes an ASIC 6-ish times longer to do a 6-ish GiB TwoCats
guess, with multiplication chains turned off.  When enabled,

This power discussion seems very relevant.  Let's get the numbers
right!  Please correct any incorrect assumptions above.

Bill
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJT+2U2AAoJEAcQZQdOpZUZuPoP/RJbpUakvFZRxfQu/0r4uxiw
NmcssnLPJXi/NiaY9+mKxhva2BdKVY3a1LGxytmHIvtvtyIWWkSqZjicMzSI5+TA
mVqK3nMbSGyyDm0MOB2SFzwqgiOB9sbHLlBBw2hMCQyVXpP0NF3UN6bhrwbLu7mO
AQd5UA7X8jvYCrXYigs2aAZKC8cBznbeyYFxuEqe9j1XelgFzLIfJtVGrKL07pEO
hl49o17K/psnNvxrg2JxyUiALPWddrrxlP8oE/9/+2wcZmoqRcd6KD6ATuf+8lUp
hP7vLpQRcKKGUO0udGzA5IhlPCmKNofVX5yrT0AHkA4+X/wsmkvXZfDuWPwQ8glZ
X7tKgF1qVmmWQQujlJKhuRlvkDNxDNUj9ciikg/cuTHZCq1ojpGmA/2elrMQTIln
dfGZRb9ru8OwQd+Ers7NZdgw4nRjzLdlAoITvRs7rOb9z105r2Rs5hchR/zb8dWb
Qa68SbGiZDR+uGF3sdJFzQ+j8gfz5GLx7lINHDvLY+DlvqEtWjn+NxiXK8qihysE
JK/gBlXIdvzotxLkipExvbr+s/obcu7JysaICIIsWsJDMXHvUiEHejgoiZODOnBz
Jgj3PHALOKg2Apfe4yE4Vt7yLzja2Spn8E5m2uCR/xLaiAROJ5mskH18Xvp0tNiX
jNAyDUPqFZEzisM84IVp
=fg7Z
-----END PGP SIGNATURE-----