lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 28 Jul 2015 13:36:52 +0200
From: Dmitry Khovratovich <>
To: "" <>
Cc: "" <>
Subject: Re: [PHC] [OT] Improvement to Momentum PoW

I am not sure what you're calling by "parallel rho", but the parallel
collision search by van Oorschot-Wiener deals exactly with this problem.
They call it "golden collision" search in Section 4 of .

The tradeoff in that paper is very favourable to the attacker: when the
memory is reduced by P^2, the time grows by the factor of P only, so it is
the sublinear growth. Storing only some small range of Hb(i) gives  a much
worse, linear tradeoff only.


On Mon, Jul 27, 2015 at 8:29 PM, Bill Cox <> wrote:

> A problem with Momentum as currently implemented is that it can be run
> with reduced memory, enabling efficient GPU attacks
> <>.
> I think we can get around this problem simply by changing the parameters it
> uses.  Here's what Momentum computes:
> for i, j in [0 .. 2^n-1]:
>     if Hb(i) == Hb[j] and H(i || j) < threshold:
>         report i, j
> In this case, Hb is specific to a block-chain digest and nonce, as is H.
> In the current implementations, n is 2^26 and the size of Hb's output is 50
> bits.  H's output is far larger, either 256 or 512 bits.  This leads to on
> average something like 3-ish collisions per nonce tried.
> If instead, we run with n == 26, and Hb's output also being 26 bits, then
> we get around 2^26 collisions.  Each needs to be tested to see if H(i || j)
> < threshold.  The obvious algorithm is to do this rapidly is:
> Initialize array M to 2^26 empty lists.
> for i in [0 .. 2^n-1]:
>     list = M[Hb(i)]
>     for j in list:
>         # i collides with j
>         if H(i || j) < threshold:
>             report i, j
>         if H(j || i) < threshold:
>             report j, i
>     list.append(i)
> This seems to defeat Bloom filters and similar memory reduction
> techniques.  While a parallel Pollard Rho algorithm would work, it's slow
> and reduces memory no more than the simple TMTO attack where we only store
> some small range of Hb(i) in the hash table to reduce it's size.
> I am not sure why this approach is not used in current implementations.
> Is there some attack I'm not seeing?
> While this slight modification might enable Momentum to resist GPU
> attacks, it's runtime is still dominated by Hb and H calculations, which an
> ASIC will speed up dramatically.  I get around that by generating 32
> non-cryptographic hash values for each cryptographic hash generated,
> getting a speed-up of over 16X.  However, modern Intel CPUs seem to be very
> weak at running buffered radix-sort.  An ASIC will run at full memory
> bandwidth, which might be far higher than a typical CPU's bandwidth.  For
> example, I can fill 1 GiB with unsorted birthday hashes in 0.2 seconds on
> my machine.  When I create 256 buffers of 4KiB each, and do a byte-based
> radix-sort pass, this delay jumps up to about 0.65 seconds.  This is with
> temporal streaming writes, so I'm not polluting L2 or L3 cache with my
> buffer writes.  An ASIC would see no such additional delay, and would
> likely have significantly higher bandwidth to memory as well.
> I'm think that ASICs will still win significantly vs CPUs on every
> implementation I've seen of Momentum.  The enhancements above should, if
> I'm not mistaken, at least stop GPUs.
> Bill

Best regards,
Dmitry Khovratovich

Content of type "text/html" skipped

Powered by blists - more mailing lists