lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 13 Jul 2015 20:04:28 +0200
From: Dmitry Khovratovich <>
To: "" <>
Subject: Re: [PHC] Thoughts about Argon2 PoW system

Hi Bill,

thank you for reviewing our scheme.

Before looking deeply into you suggested modification, just one idea. Did
you consider parallel hashing for building the Merkle tree? Since you
already employ, say, 4 cores for multi-threaded Argon2d, how about using
them for Blake2bp? The designers report about 6x-speedup compared to the
single-core version , so this should
be about 0.5 cycles per byte, even faster than Argon2d.

Best regards,

On Tue, Jul 7, 2015 at 1:34 AM, Bill Cox <> wrote:

> First, kudos to the Argon team.  SFAIK, their algorithm actually works,
> and is the only published memory-hard PoW that actually does.  However, I
> don't think it's ready for use, yet.
> In short, their algorithm is to build a Merkel hash tree for all of the
> memory filled with their memory-hard hashing algorithm.  It is generic, and
> can be applied to any of the memory-hard algorithms.  They essentially do
> this poof of work algorithm <>, but
> with the leaves replaced with the memory contents after the memory-hard
> hashing algorithm runs.
> The biggest problem I see is that they have to hash all the data, with a
> full cryptographically strong hash function.  Any data not fully hashed is
> not cryptographically tied to the Merkel hash root, and could be forged.
> The full Blake2b is fast, but this additional post-memory-hashing Merkel
> tree building step will dominate the runtime on any CPU.  Worse, an ASIC
> attacker will get something like a 10X to 30X performance boost in this
> step, with lower power consumption.  This is disastrous, because the main
> point of memory-hard PoW systems is to reduce the advantage gained by using
> custom ASICs.
> I do not see any solution on x86 processors other than to use the built-in
> crypto instructions
> <>
> to reduce the difference in performance relative to ASICs.  What would the
> best primitive be to use in this case?
> The SHA256 instructions seems pretty slow, at 2.67 cycles per byte.  I
> suspect an ASIC will run much faster.  AES-GCM-encrypt is reasonably fast
> at 1.26 bytes per cycle, with a resulting 128-bit authentication tag.
> However, that's still too slow.  PHC finalists Argon2, Lyra2, and Yescrypt
> all max out my 25GiB/s memory bandwidth when running enough threads.  I
> wont be able to do that using any of Intel's crypto primitives and full
> cryptographic hashing.
> One improvement to the Argon2 PoW algorithm would be to have a memory-hard
> main loop like:
>     mem[0] = H(digest, nounce)
>     for i = 0 to memSize-1 {
>         randAddress = mem[i-1] % i;
>         mem[i] = H2(mem[i-1], mem[randAddress]);
>     }
> This is very similar to the current Argon2d loop, but the block hashing is
> replaced with H2, which could be SHA256(a, b), for example.  After filling
> memory, the final hash value  written is the Merkel hash root.  Use it to
> compute a random path of length L from the root to mem[0].  So long as L is
> secure number of hops, like (128?), then then hashes used to "open" mem[0]
> should constitute a secure proof that the memory-hard hashing algorithm has
> been run.
> In comparison to the Argon2 current PoW system, this performs half the
> number of full hashes, since the Merkel tree is embedded in the data used
> to fill memory, rather than computed as a post-process.  This should enable
> it to have a 4x higher memory*time defense if memory bandwidth bound.  The
> proof would be about the same length (num-bits of security times hash
> length).  However, it is no longer generic like the current Argon2 PoW
> algorithm.  Also, Argon2's algorithm is more familiar to people who know
> about the Merkel hash tree PoW algorithm.
> Even with a 2X speedup and Intel crypto-intstructions, the algorithm is
> not close to filling my CPU's I/O bandwidth.  A GPU might do better, but an
> ASIC can do it faster and at far lower power.  I think these memory-hard
> PoW algorithms with fast verification are not quite ready for prime-time.
> Can anyone see a solution to the speed problem?  If ASICs will be a lot
> faster and lower power, then there's no point using the algorithm, IMO.
> Bill

Best regards,
Dmitry Khovratovich

Content of type "text/html" skipped

Powered by blists - more mailing lists