phc-discussions - Re: [PHC] Bandwidth hardened algorithms?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOLP8p6HaiBVivFSQj3hwSvSLYk+zARB8p7n9bPWhQ54-hkuAA@mail.gmail.com>
Date: Mon, 13 Jul 2015 14:39:12 -0700
From: Bill Cox <waywardgeek@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Cc: John Tromp <john.tromp@...il.com>
Subject: Re: [PHC] Bandwidth hardened algorithms?

On Mon, Jul 13, 2015 at 10:56 AM, Dmitry Khovratovich <
khovratovich@...il.com> wrote:

> Hi Bill,
>
> As I understand you deal with the attacks that require a lot of bandwidth
> if run in parallel. This is the same effect that we encountered in our
> tradeoff attacks, where I remember your bandwidth-related critique.
>

That's correct.

> I think both bandwidth- and memory- hardening fit into more generalized
> framework of cost-hardening, where we want to lower bound the ASIC running
> costs. We already have a good/standard metric for the cost --- it is the
> time-area product (AT). Costs of many parallel algorithms have been
> evaluated in this metrics, and we know how to compare them.
>
> In the AT metric both memory and bandwidth are related to the area. Area
> grows linearly with the memory. On the other hand, in the 2D view,
> bandwidth is probably proportional to the area as well. At least, it is
> easy to imagine a twice larger chip that has double bandwidth. Similarly to
> counting how much memory fits into a mm^2, we could count how much
> bandwidth a 1-mm^2 chip can provide.
>
> Therefore, if we fix the area, we implicitly bound both memory size and
> bandwidth. The notion of bandwidth hardness then translates to a lower
> bound on time as a function of area, i.e. to the lower bound on the cost.
> The same we already have for the memory, where memory-hardness implies a
> lower bound on the amount of computations needed to compute the function
> and thus to the lower bound on the time, since the parallelism is bounded
> by the area.
>

I agree.  I think we can generally equate bandwidth cost to silicon area
cost.  In most chips, bandwidth is limited by I/Os, and the area grows as
the square of the number of I/Os for larger chips where the pad ring does
not dominate the area.  For chips where the I/O ring dominates, the
relationship is only linear.  One fear I have of bandwidth-hardening is
that it is possible to build very wide and short chips that have a high I/O
area to core area, so an attacker can probably see the linear cost
dependence, rather than the squared cost.  The chips I know of with very
high bandwidth are all nearly square: Intel processors, DRAM, FPGAs, and
some high I/O bandwidth ASICs I've seen.  I suspect a custom ASIC which
optimized it's high-speed serial I/Os with only a small core area could be
far more cost efficient than what I've seen so far.

So, I prefer algorithms that are both a memory and bandwidth hardened, such
as my favorite 3 PHC finalists, Yescrypt, Lyra2, and Argon2d.

> ----------------
>
> As a side note, I find strange that you still consider Momentum.
> Memoryless collision search is so well known and explored, that this scheme
> is barely better than plain SHA-256.
>
> Best regards,
> Dmitry
>

The parallel Pollard Rho algorithm would make short-work of Momentum,
except that when they actually implemented it, they changed it slightly
from the description in the paper.  In particular, the implementation maps
26-bit input hashes to 50-bit output hashes.  Any collisions we find in
paths to distinguished points using parallel Pollard Rho are far more
likely to be input collisions, which don't count, than output collisions.
AFAICT, roughly having twice the output bit width as input defeats parallel
Pollard Rho.

The best algorithms posted online so far to attack the version of Momentum
used in the BitShare crypto-currency all involve a substantial amount of
memory hashing.  None I read about use less than 256 MiB of memory which is
only half the memory needed to represent all 2^26 50-bit hashes.  The main
weakness for ASIC defense in these algorithms is that they are all
memory-latency bound rather than memory-bandwidth bound.  With an FPGA or
ASIC, I think the row-switching latency can be mostly eliminated.  This is
similar to my reservations about using full Blake2b hashes in a Merkel hash
tree to build a memory-hard PoW with fast verification.  An ASIC will
eliminate these delays, and be purely memory bandwidth bound.

I still have not seen a good PoW with rapid verification which I believe
will succeed in defending against ASICs in a popular crypto-currency mining
situation.  It feels like we are close.  Momentum needs to fix it's memory
latency issue.  Argon2d needs to fix it's cryptographic hashing latency
issue.  An ASIC implementation will not be bothered by either issue.  I
haven't figured out any solution, though.

Bill

Content of type "text/html" skipped