lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 16 Nov 2013 14:11:28 -0500
From: Daniel Franke <>
To: Solar Designer <>
Subject: Re: [PHC] The EARWORM password hash

Thanks a bunch for the feedback. There's a lot of it. I'm only getting
to about half of it in this reply.

Solar Designer <> writes:

> Right.  How do you propose the ROM be initialized?  Is this within scope
> of EARWORM itself or totally external to it?

The way I do it for my test vectors is to generate an AES-CTR keystream
from the key "dontusethiskeyinproduction". What I have in mind for
production use is to give the user a program that reads a key from
/dev/[u]random, generates an arena from it and writes it to disk, and
then prints the key to stdout with instructions to write it down and
store it offline. The user can then regenerate or enlarge the arena by
resupplying the key. This will all be discussed in my submission
document, but only in the spec proper insofar as the test vectors are
concerned. The rest will be relegated to the "Usage Considerations"

I considered ideas similar to yours for speeding up initialization time,
but couldn't convince myself that anything I thought of was secure. I
eventually just gave up because I don't think most users are going to be
willing to use an arena large enough to make addressing the issue worth
the added complexity.

> The above has what we'd consider a flaw under our threat model (attackers
> with many smaller memory machines), but you might not (you mentioned
> using this with only 256 MiB anyway).  scratchpad[] elements beyond
> index 0 are only needed for the final prf() computation, so instead of
> working on the full arena[] under each workunit, an implementation could
> efficiently work on portions of the (re-structured) arena[] on a machine
> with a smaller amount of fast memory (and swap those portions outside of
> the "for d from 0 to (WORKUNIT_DEPTH / 2)" loop).

I only used the 256MiB arena for the sake of GPU experimentation (my
7850 has a 512MiB contiguous-allocation limit). In practice, I'm
picturing sizes from 1-16 GiB. The idea of trying to compete with
attackers on the total memory capacity of a single node is not one I've
attempted to pursue, but if somebody convinces me that it's a good idea
then the tweak necessary to defeat this attack is pretty
straightforward. Just flatten the array and allow chunks to start at any

> Have you experimented with such prefetching (using the corresponding CPU
> instructions)?  Similarly, have you tried removing the second index
> variable (and thus the ability to prefetch this much in advance) - does
> it affect performance on current CPUs or does it not matter (yet)?

Yes, my optimized implementation contains prefetch instructions. I get
about a 5% performance hit if I comment them out.

Powered by blists - more mailing lists