lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOLP8p4LBDoCQgj4p37GLr5WmrSn+Y3y_YgjxL6mNt+G-FAD-Q@mail.gmail.com>
Date: Fri, 28 Aug 2015 07:31:31 -0700
From: Bill Cox <waywardgeek@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Re: [PHC] Flaw in Argon2 TMTO ASIC analysis

It's proportional to the number of I/Os, which on most ASICs is
proportional to sqrt(area).  This is why I doubt the optimal cost ASIC
attack will maximize I/O bandwidth, but assuming that the bandwidth is
fixed, and is the limiting factor, the analysis is the same.

With multiplication chain compute-time hardening, I think you are right
that the simple ASIC attack will not be able to use it's potential
bandwidth.  To make use of it's bandwidth, the an attacker will likely use
a 2X memory reduction, doubling the required bandwidth but reducing his
memory cost.  At that point he'll optimize as many factors as he can to get
around the compute-time hardening, including:

- Interleaving 2 computations, doubling memory bandwidth _and_ memory size,
but the total system cost includes the ASIC cost which does not increase in
this case.
- Using a cheaper ASIC with fewer I/O pads and reduced bandwidth

The real world is messy.  However, the simple attack ASIC attack storing
every 1/alpha memory location without compute time hardening and external
DRAM for storage is easy.  I would prefer that the paper stick to this case
in the basic analysis and then discuss ramifications of improvements such
as multiplication chains or integrating logic directly onto the DRAM ICs.
Compute time hardening is even more important in that case, I think.

Bill

On Fri, Aug 28, 2015 at 6:49 AM, Dmitry Khovratovich <khovratovich@...il.com
> wrote:

> Sorry, contunuing "The only hesitation is that we do not have
> authoritative sources on maximal ASIC bandwidth (should it be proportional
> to size?)"
>
> On Fri, Aug 28, 2015 at 3:48 PM, Dmitry Khovratovich <
> khovratovich@...il.com> wrote:
>
>> Hi Bill,
>>
>> You're right that the total bandwidth grows proportionally to C(alpha),
>> and if the bandwidth has been exhausted, then indeed the running time
>> should grow proportionally to C(alpha). We plan to account for bandwidth
>> limitations in the further revisions of our paper(s) on tradeoff analysis
>> and Argon2. The only hesitation is that
>>
>> However for so small memory reductions I think that the running time
>> should not grow. If the ASIC running time is about the same as for CPU
>> (which is reasonable to expect), then the bandwidth would be about 20-50
>> GB/sec, so it can grow by the factor 4-8 without affecting running time.
>>
>> Thanks for reminding anyway.
>>
>> BTW we also have a more detailed series for C(alpha) which covers
>>  non-integer alpha as well.
>>
>> On Fri, Aug 28, 2015 at 3:33 PM, Bill Cox <waywardgeek@...il.com> wrote:
>>
>>> I objected to this flawed analysis when applied to Lyra2 and Yescrypt,
>>> so it's only fair I object when applied to Argon2 :)
>>>
>>> The Argon2 paper states:
>>>
>>> "We conclude that for data-dependent one-pass schemes the adversary is
>>> always able to reduce the memory
>>> by the factor of 4 and still keep the time-area product the same."
>>>
>>> This is wrong.  Given their simple attack, there is no memory reduction
>>> factor that results in any time-area product reduction.  The algorithm is
>>> far more TMTO resistant than the authors claim (as is Lyra2).
>>>
>>> In their attack, they assume the simple model where alpha is an integer
>>> and we keep every 1/alpha memory location in external DRAM.  When using 1/2
>>> the memory (alpha = 1/2), and a uniform random index function (the square
>>> function used now is better), the average recomputations, which are called
>>> C(alpha), goes to 2 for large memory hashing.  Their table incorrectly
>>> shows C(alpha) = 1.5.
>>>
>>> The function D(alpha) is used to compute the additional runtime factor.
>>> However, it is assumed to be no longer than the deepest computation path
>>> required to recompute missing blocks.  This function D(alpha) is a clear
>>> theoretical lower-bound on recomputation speed, but it is incorrect to use
>>> this number in theiir time-area ASIC attack analysis.
>>>
>>> The paper correctly states that the fastest ASIC bandwidths to memory
>>> are on the order of 400 GiB/s.  An ASIC attack will optimize many factors,
>>> and one of the most critical is ASIC cost for the resulting bandwidth.
>>> Assuming this results in a 400 GiB/s ASIC (probably incorrect, but the
>>> analysis is the same for other choices), the runtime is proportional to
>>> C(alpha), the number of recomputations, not D(alpha), the computation tree
>>> depth.  This is because the required bandwidth is increased by a factor of
>>> C(alpha), and the ASIC will be bandwidth limited when attacking Argon2.  I
>>> can think of no reasonable ASIC attack for large-memory hashing (>= 1 GiB)
>>> that would be limited by any other speed factor, as the paper did not
>>> include multiplication based compute-time hardening in this analysis.
>>>
>>> Thus the time is proportional to C(alpha) and the area is proportional
>>> to alpha.  The time-area product is alpha/C(alpha).  When using the
>>> corrected C(alpha) for the simple attack (keeping every 1/alpha block),
>>> C(alpha) is 2, and at the 1/2 memory level time-area = (1/2)/2 = 1.  At
>>> lower alpha (higher memory reduction factors), recomputations increase
>>> faster than the memory reduction, increasing the area-time cost to the ASIC
>>> attacker.  Therefore, there is no choice for alpha that reduces the
>>> time-area product.
>>>
>>> Improved attacks will keep less memory at high memory addresses and more
>>> at lower addresses.  Against a uniform random index function, this can only
>>> decrease the time-area product by about 15% at most.  With the improved
>>> index function, the maximum TMTO improvement is even less.  I believe the
>>> potential benefit is so small, no significant ASIC attack will ever bother
>>> to do a TMTO against 1-pass Argon2d.
>>>
>>> If this were the Argon2 team claiming a competitor had a simple attack
>>> resulting in a 4X area-time reduction, I would be upset.  Since it is their
>>> own algorithm they are talking about, I'll simply correct their math :)
>>>
>>> Bill
>>>
>>
>>
>>
>> --
>> Best regards,
>> Dmitry Khovratovich
>>
>
>
>
> --
> Best regards,
> Dmitry Khovratovich
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ