phc-discussions - Re: [PHC] Re: Tradeoff cryptanalysis of password hashing schemes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALW8-7J1iNWLOxDbvPqBQS+TBsiTL_gxUT8O65-RytwOUcpijA@mail.gmail.com>
Date: Wed, 3 Sep 2014 16:30:34 +0200
From: Dmitry Khovratovich <khovratovich@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Re: [PHC] Re: Tradeoff cryptanalysis of password hashing schemes

Some more details for the ASIC discussion.

1) An ASIC-equipped adversary will aim to minimize the running costs, which
on the long term will be dominated by the power consumption and cooling.

2) A high-budget adversary will not restrict himself to commercially
available memory chips only, but will definitely consider custom designs.

3) The memory power consumption (roughly) consists of retention power (to
sustain the state) and active energy (to read or write).

4) Regarding active energy, it would be natural to prefer a memory
architecture that consumes 0.0001 J to read 1 Gbit (one of SRAM designs -
[1]) to the one that consumes 0.5J to read 1 Gbit (DDR5).

5) I did not calculate the retention power for DDR5, but to match [1] ( 0.1
W ) it must be around 3% of the maximum power consumption, not speaking of
0.0001 W for the design [2].

6) There are other, low-power DRAM designs, that can be considered here.

7) Large on-chip memory is needed mainly to reduce the latency. However,
for the schemes currently attacked this is not a problem. Catena has
memory-independent addressing, and Lyra2 has huge blocks sequentially
stored in the memory. As a result, off-chip memory with latency up to 10-15
cycles is still suitable for the tradeoff attacks as the latency of the
entire scheme remains pretty much the same.

8) If we consider DRAM-restrictive adversaries, then Lyra2 can be run with
1/2 of memory with no energy penalty: the increase of memory reads from 6
GB to 7.6 GB per password and 20% increase in the running time is
compensated by the 50% memory reduction. It may even be that running Lyra2
with 1/2 of memory takes less energy than with full memory.


On Mon, Aug 25, 2014 at 6:32 PM, Bill Cox <waywardgeek@...hershed.org>
wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 08/25/2014 04:58 AM, Dmitry Khovratovich wrote:
> > As there are questions on this mailing list on how we got our
> > energy & size estimates, here is a short insight.
>
> Thanks for the data.
>
> > A naive (no tradeoff) implementation of a password hashing scheme
> > on ASIC with up to a GByte of RAM would use only a tiny portion of
> > memory at every step. When the memory is not used, it still
> > consumes power to sustain the state values. An energy-efficient
> > adversary would use the memory that minimizes the total energy
> > spent: for reading, writing, activating, etc. There exist low-power
> > static RAM designs, of which the following two are interesting (we
> > used [1]): [1] 850 MHz, 65nm design with retention mode leakage 0.1
> > mW (10^{-4} Watt ) per MBit and active energy (read/write) 0.0001 J
> > (10^-4) per GBit. [2] 333 MHz, 65nm design with retention mode
> > leakage 0.00015 mW (10^{-7} Watt ) per MBit and active energy
> > (read/write) 0.0004 J (10^-3.5) per GBit.
> >
> > 1 GByte of [1] occupies 12,000 mm^2, and of [2] -- 5,000 mm^2. This
> > is certainly larger than the current CPU/GPU chip sizes (500-700
> > mm^2 for recent Intel and nVidia designs), but comparable to
> > largest CMOS sensors (40,000 mm^2 and more) and  certainly smaller
> > than a single wafer. In fact, our own tradeoffs do not explicitly
> > require all the memory being on a single chip, but it should have
> > reasonable latency (<5 cycles).
>
> That's 18 square inches.  No one on will sell you an ASIC with more
> than 1 square inch!  Please update your numbers to reflect the fact
> that you will have to go to off-chip GDDR5 DRAM for a 1GiB hash.
> There is simply no way around it.  Also, Catena is designed for
> on-chip cache sizes of maybe up to 30MiB.  Comparison should be done
> vs an on-chip SRAM cache with zero latency for Catena, since
> predictable addressing allows us to have the data exactly when we need it.
>
> BTW, Intel uses eDRAM (embedded DRAM) in their latest Haswell
> processors.  It is very tricky to integrate eDRAM, but they succeeded,
> and the result is a dramatic power and area reduction.
>
> Here's a recent GDDR5 datasheet:
>
> http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFR%28Rev1.0%29.pdf
>
> This is a couple years out of date, but it's fairly close to state of
> the art.  This is about the best DRAM you can buy for cracking
> passwords.  Please consider using specs from real DRAM that is
> commercially available, rather than something that is better than what
> companies like Samsung are capable of.
>
> The relevant pages are 128 and 129.  Supply voltage is 1.6V, and the
> IDD4R and IDD4W are the supply current for reading and writing at
> different speeds.  For 6 Gbps, it's     1.89A for writing and 1.82A for
> reading.  At a slower 4 Gbps, it's 1.37A for writing and 1.3A for
> reading.  The power is 2.9W for reading at 6Gbps, vs 2.1W for 4Gbps.
> Clearly the dynamic power loss is larger than the static power loss.
>
> Read latency is shown on page 42.  At 6Gbps, the first data from a new
> 1KiB block will take 19 clock cycles (CAS latency), assuming burst
> mode is enabled (probably a good idea).  It's 1.5GHz, so that's
> 12.6ns.  After that, data streams at 6Gbps per pin, with 32 pins
> (24GiB/s total), so your 1KiB will take another 42.6ns. to transfer.
>
> A reasonable goal would be to have several such GDDR interfaces on
> your ASIC.  It is very hard to do!  However, it is doable, at least in
> theory.  If you had 512 pins all running at 6Gpbs, you'd have 384GiB/s
> bandwidth.  That would probably set a world record for chip bandwidth,
> but it's within a factor of 2X of what has been built before.
>
> Because recomputations increase the total read/writes per hash,you're
> going to most likely *increase* DRAM power with a high TMTO.  The
> lowest power attack is going to be with *no* TMTO.  You're also going
> to max out that bandwidth pretty quick limiting computation speed.
>
> If I read your table right, you do 21 read/writes to do a 1GiB hash.
> I assume these are sequential operations on 1KiB blocks.  That's 2.56M
> sequential external DRAM read/writes, at 55ns each, for 140ms.  You
> simply can't do this in 20ms, as your slide claims.  The DRAM energy
> is 2.9W for 140ms, or about 0.4J.  In your slide you had 0.034J.
> You're off by 10X.  Assuming your AES power calculation is correct,
> then DRAM power dominates.  Using more DRAM with a faster algorithm
> (Argon does 21 sequential read/writes
>
> In comparison, you said Lyra2 would take 80ms to read/write 6GiB.  Why
> would reading/writing 6GiB to external DRAM take longer for Lyra2 than
> reading/writing 21GiB to the same external DRAM for Argon?
>
> Also, when comparing the power for an ASIC to compute a password hash,
> we need to use real memory sizes supported by the algorithm.  In 1
> second, TwoCats can hash about 6GiB on my Ivy Bridge machine.  It is
> heavily memory bandwidth limited.  It writes memory once and reads
> memory once.  A system using TwoCats would hash several times more
> memory in the same time as Argon, and this should be taken into
> account when considering an ASIC attack.  A 1GiB Argon password guess
> likely takes an ASIC 6-ish times longer to do a 6-ish GiB TwoCats
> guess, with multiplication chains turned off.  When enabled,
>
> This power discussion seems very relevant.  Let's get the numbers
> right!  Please correct any incorrect assumptions above.
>
> Bill
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQIcBAEBAgAGBQJT+2U2AAoJEAcQZQdOpZUZuPoP/RJbpUakvFZRxfQu/0r4uxiw
> NmcssnLPJXi/NiaY9+mKxhva2BdKVY3a1LGxytmHIvtvtyIWWkSqZjicMzSI5+TA
> mVqK3nMbSGyyDm0MOB2SFzwqgiOB9sbHLlBBw2hMCQyVXpP0NF3UN6bhrwbLu7mO
> AQd5UA7X8jvYCrXYigs2aAZKC8cBznbeyYFxuEqe9j1XelgFzLIfJtVGrKL07pEO
> hl49o17K/psnNvxrg2JxyUiALPWddrrxlP8oE/9/+2wcZmoqRcd6KD6ATuf+8lUp
> hP7vLpQRcKKGUO0udGzA5IhlPCmKNofVX5yrT0AHkA4+X/wsmkvXZfDuWPwQ8glZ
> X7tKgF1qVmmWQQujlJKhuRlvkDNxDNUj9ciikg/cuTHZCq1ojpGmA/2elrMQTIln
> dfGZRb9ru8OwQd+Ers7NZdgw4nRjzLdlAoITvRs7rOb9z105r2Rs5hchR/zb8dWb
> Qa68SbGiZDR+uGF3sdJFzQ+j8gfz5GLx7lINHDvLY+DlvqEtWjn+NxiXK8qihysE
> JK/gBlXIdvzotxLkipExvbr+s/obcu7JysaICIIsWsJDMXHvUiEHejgoiZODOnBz
> Jgj3PHALOKg2Apfe4yE4Vt7yLzja2Spn8E5m2uCR/xLaiAROJ5mskH18Xvp0tNiX
> jNAyDUPqFZEzisM84IVp
> =fg7Z
> -----END PGP SIGNATURE-----
>



-- 
Best regards,
Dmitry Khovratovich

Content of type "text/html" skipped