phc-discussions - Re: [PHC] Tradeoff cryptanalysis of password hashing schemes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 8 Aug 2014 17:14:46 -0400
From: Bill Cox <waywardgeek@...il.com>
To: Dmitry Khovratovich <khovratovich@...il.com>
Cc: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>, ben@...rr.is
Subject: Re: [PHC] Tradeoff cryptanalysis of password hashing schemes

On Fri, Aug 8, 2014 at 2:02 PM, Dmitry Khovratovich <khovratovich@...il.com>
wrote:

> Hi Bill,
>
> 1. Yes, the tradeoff should be MT = 16N^2 indeed.
>

Ah... so I wasn't crazy :-)  That makes me feel better.


> 2. Constants like 2^18 do not matter for deciding whether proof is correct
> or not. Our attack shows that q^lambda does not appear for q<N^{1/3}. With
> some improvement, you can have the penalty O(q^2) for q<N^{0.42} for any
> constant lambda. The gap between N^{0.42} and N^{0.5} is interesting to
> explore, but it is not so much important.
>

I had some issues with the proof actually.  Mostly it was just that the
constant kept getting *much* bigger with increasing lambda, as if it
depends somehow on lambda.  So, q == N^(1/3) is still to small?  My proofs
for my dismal lower bounds all used q = N(1/2).


> 3. I do not understand what is "lower bound for low q for Catena".
>

I meant a lower bound on the computation penalty for the kinds of q's we've
been looking at, like 1/2, 1/4, and 1/8.  If you plug these into the Catena
lower bound model, it says that at least a small fraction of at least one
hash will be required :-)  Not a very practical lower bound, IMO.  Factors
of 2^18 do tend to be important in benchmarks.


> 4. I can try to break your primary submission, if you give some explicit
> security claims for your fastest secure set of parameters. This applies to
> all other candidates, actually.
>

The most secure parameters are those that maximize memory use while
providing some reasonable compute-time hardening.  This is entirely machine
dependent, as well as application dependent.

However, I should be expected to defend the security of my default hashing
parameters:

#define TWOCATS_MEMCOST 20 // 1 GiB
#define TWOCATS_PARALLELISM 2
#define TWOCATS_BLOCKSIZE 16384
#define TWOCATS_SUBBLOCKSIZE 64
#define TWOCATS_TIMECOST 0
#define TWOCATS_MULTIPLIES 2
#define TWOCATS_HASHTYPE TWOCATS_BLAKE2S
#define TWOCATS_LANES 8
#define TWOCATS_OVERWRITECOST 6

This runs two threads in parallel to hash 1GiB of memory.  It likely gains
some sort of GPU resistance with SUBBLOCKSIZE set to 64, as this causes
fairly small random L1 cache reads at high speed.  With 2 multiplies, it's
likely 30%-ish compute-time hardened, (meaning an ASIC gets a free 3X-ish
speedup of multiplies).  However, this depends on the machine.  Ideally the
multiplication chain speed is well matched to the memory hashing speed.
Overwrite cost of 6 means we waste the first 1/32nd of runtime overwriting
the same memory over and over, so that if an attacker gets a memory
snapshot, we still have some defense.

On my test machine, running "twocats" with no parameters benchmarks these
these values.  It runs between 0.2s and 0.24s, hashes 1GiB of DRAM, doing
one write and one read to every memory location.

Unfortunately, there's not a lot there to attack, other than verifying
benchmarks.  However, I did make several claims that could be attacked :-)
Here are some:

- TwoCats limits a custom ASIC's speed advantage to about 2X per core
(through multiplication chains).  Solar Designer now has me worried there
might be attacks closer to 4X faster, with custom multiplier
architectures.  Assumes same process for the ASIC and CPU.
- Only minor known attacks for reducing M*T, limited to around a 10% M*T
bonus for the attacker, with q < 4.  For q >= 4, I claim M*T is higher,
with no M*T bonus to the attacker.
- For q == 8, my best pebbling  of a 1024 node graph had a 973X penalty.  I
do not feel this is a good lower bound, though.  However, if you can do it
in < 100X, I will be very impressed.
- The derived key cannot be inverted to reveal the password significantly
faster than brute force (assumes security of H is similar)
- The input collision strength should be as good as H.
- The output changes unpredictably, and seemingly random with any input
change, defeating differential attacks and such.  Again depends on security
claims of H.
- The hash function looses negligible entropy, even when filling many GiB
of RAM
- TwoCats does no key material derived addressing or branching while
filling the first half of memory.  This provides some off-line brute force
defense in the case of cache timing leaks, but does not protect meta data
that may also leak.

Your power analysis per guess is very interesting.  You assume power is
proportional to the amount of memory used (1/q), and give password crackers
running with a high q highly reduced power.  For your Catena pebbling
algorithm, you may be able to avoid increasing read/writes to DRAM because
of predictable read/writes.  However, the Scrypt inspired algorithms that
spend 50% or more of their time doing unpredictable read/writes cannot
easily reduce their bandwidth to DRAM through a TMTO.  When a value has to
be recomputed, it will depend on other values in DRAM, leading to more,
rather than fewer power hungry read operations.  Memory bandwidth can also
quickly become a bottleneck in an ASIC attack.  Certainly power will not
drop proportionally with q.  My guess is that it will increase.

The Argon ASIC has some really awesome DRAM!  I want some of that DRAM that
let's me r/w 21GiB in 0.02s!  I suspect there's a typo or I'm
misunderstanding the table...  Also, on my machine, 1GiB hashing using the
PHS interface to Argon takes 3.5 seconds (t_cost = 3, m_cost = 1024*1024, 1
thread, Argon PHS defaults).  It seems to be CPU bound rather than memory
bandwidth bound.  I would expect an ASIC to flip this situation so that it
becomes memory bandwidth bound.  Please let me know if I've compiled with
the wrong parameters or such!

Maximizing an attacker's power per guess is a very cool idea.  I think any
of the entries that do unpredictable addressing and have settings that
enable them to max out memory bandwidth will all be somewhat similar.
Honestly, Yescrypt has the best balance of maxing out computation cores and
DRAM bandwidth at the same time.  It probably has a bit of an edge in terms
of power per guess hardness.  My own TwoCats is a bit light on the CPU
thrashing, and heavy on the DRAM read/write, so I'll have to rely on DRAM
being the power monger.

Bill

Content of type "text/html" skipped