phc-discussions - Thoughts about Argon2 PoW system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOLP8p4B+eUtEcQxSHNpSs=RiZAiy=Saxa4yn0GH+b5HtM8_0Q@mail.gmail.com>
Date: Mon, 6 Jul 2015 16:34:32 -0700
From: Bill Cox <waywardgeek@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Thoughts about Argon2 PoW system

First, kudos to the Argon team.  SFAIK, their algorithm actually works, and
is the only published memory-hard PoW that actually does.  However, I don't
think it's ready for use, yet.

In short, their algorithm is to build a Merkel hash tree for all of the
memory filled with their memory-hard hashing algorithm.  It is generic, and
can be applied to any of the memory-hard algorithms.  They essentially do
this poof of work algorithm <http://eprint.iacr.org/2007/433.pdf>, but with
the leaves replaced with the memory contents after the memory-hard hashing
algorithm runs.

The biggest problem I see is that they have to hash all the data, with a
full cryptographically strong hash function.  Any data not fully hashed is
not cryptographically tied to the Merkel hash root, and could be forged.
The full Blake2b is fast, but this additional post-memory-hashing Merkel
tree building step will dominate the runtime on any CPU.  Worse, an ASIC
attacker will get something like a 10X to 30X performance boost in this
step, with lower power consumption.  This is disastrous, because the main
point of memory-hard PoW systems is to reduce the advantage gained by using
custom ASICs.

I do not see any solution on x86 processors other than to use the built-in
crypto instructions
<http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/haswell-cryptographic-performance-paper.pdf>
to reduce the difference in performance relative to ASICs.  What would the
best primitive be to use in this case?

The SHA256 instructions seems pretty slow, at 2.67 cycles per byte.  I
suspect an ASIC will run much faster.  AES-GCM-encrypt is reasonably fast
at 1.26 bytes per cycle, with a resulting 128-bit authentication tag.
However, that's still too slow.  PHC finalists Argon2, Lyra2, and Yescrypt
all max out my 25GiB/s memory bandwidth when running enough threads.  I
wont be able to do that using any of Intel's crypto primitives and full
cryptographic hashing.

One improvement to the Argon2 PoW algorithm would be to have a memory-hard
main loop like:

    mem[0] = H(digest, nounce)
    for i = 0 to memSize-1 {
        randAddress = mem[i-1] % i;
        mem[i] = H2(mem[i-1], mem[randAddress]);
    }

This is very similar to the current Argon2d loop, but the block hashing is
replaced with H2, which could be SHA256(a, b), for example.  After filling
memory, the final hash value  written is the Merkel hash root.  Use it to
compute a random path of length L from the root to mem[0].  So long as L is
secure number of hops, like (128?), then then hashes used to "open" mem[0]
should constitute a secure proof that the memory-hard hashing algorithm has
been run.

In comparison to the Argon2 current PoW system, this performs half the
number of full hashes, since the Merkel tree is embedded in the data used
to fill memory, rather than computed as a post-process.  This should enable
it to have a 4x higher memory*time defense if memory bandwidth bound.  The
proof would be about the same length (num-bits of security times hash
length).  However, it is no longer generic like the current Argon2 PoW
algorithm.  Also, Argon2's algorithm is more familiar to people who know
about the Merkel hash tree PoW algorithm.

Even with a 2X speedup and Intel crypto-intstructions, the algorithm is not
close to filling my CPU's I/O bandwidth.  A GPU might do better, but an
ASIC can do it faster and at far lower power.  I think these memory-hard
PoW algorithms with fast verification are not quite ready for prime-time.
Can anyone see a solution to the speed problem?  If ASICs will be a lot
faster and lower power, then there's no point using the algorithm, IMO.

Bill

Content of type "text/html" skipped