lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 5 Jan 2014 15:47:28 -0500
From: Bill Cox <>
Subject: Security concern for Catena

I just skimmed through the paper on Catena.  Not that my opinion counts,
but it's a very nice piece of work!  Many ideas in Catena are very close to
what I was looking for in password security.

That said, I see a potential security flaw.  Let's say an attacker has a
goal of 100X reduction in the memory cost.  Let's assume the hash function
is SHA-256, and that V(i+1) = SHA-256(V(i)).  In the first loop, we fill
memory with the chained SHA-256 output.  An attacker has to compute it all.
 Instead of storing every memory location as he goes, he only stores 1 in
100 locations, saving 100X the memory area and bandwidth.

In the second loop, he accesses memory in a random-ish but pre-determined
pattern.  Again, he must have the values at each of these random-ish
locations and is forced to recompute them from the saved SHA-256 data,
which on average will be 50 rounds of SHA-256 to provide the missing data.
 So far, so good, because our area*time has not changed by more than a
small constant.

However, because the attacker knows the next 100 memory locations that need
to be computed, he can have 100 SHA-256 hashing cores working in parallel
to provide the needed data, sort of like doing a memory prefetch.  Every
cycle, we just take an idle core and put it to work computing the SHA-256
data we'll need in 100 cycles.  That core will have enough time if it
produces one SHA-256 hash per cycle.  By the time the attacker needs the
data, it's ready and waiting for him.

This whole custom ASIC uses 1/100th of the external memory and bandwidth,
but has 100 on-chip SHA-256 hashing cores.  Maybe that many cores costs a
bit of die area, but it's tiny compared to the multi-GB of RAM that the
attacker may have saved.

This attack is what originally convinced me that the memory-read pattern
has to depend on the password.  Now I'm not so sure.  If Catena were
modified to work like NoelKDF, it just might stop such an attack.  In
particular, instead of filling memory with a simple hash chain, why not add
a random-ish previous value to the hash in addition to the prior value?  If
you do this, and an attacker keeps only 1 out of 100 SHA-256 hashes, he's
in trouble.  To generate V(i+1) he needs V(i) which is no problem, but he
also needs V(random-ish value < i).  Using the bit reversal of i, then
right-shifting until it's less than i might work as the index for a
random-ish prior value.  Even better, use Alexander's sliding power-of-2
window to convert the bit reversal of i into the random-ish address with a
bitwise AND and addition.  If the attacker does not have that value saved,
he'll have to recompute it, and unfortunately for him, he'll need the prior
value to that plus the random-ish prior value.  This blows up
exponentially, and he'll have to recompute most missing values < i.  This
is how NoelKDF is secure from this attack.

I see no way for an attacker to reduce memory by 100X and avoid having to
recompute most of the missing values for each V(i).  Each value, instead of
depending only on the prior value, now is the result of a tree-like DAG of
prior values.  He needs a full DAG cut to recompute the output.  Over half
of the last half of values depend on the first half of values, and a cut of
the DAG for computing the lasts value at the middle would have an insane
number of of edges.  They would point to over half of the first half of

If you guys agree with my attack and potential solution, I'll try to
prototype it this week in NoelKDF.  It would be pretty cool to eliminate
the timing attacks.  A side benefit of this scheme is that you no longer
need the second loop at all.  The run-time should be very similar to the
original Catena scheme, because you'll still need to do one write and one
read per memory location, and two hashes.  I still don't think there's any
need for something as solid as SHA-256 for the hash.  I'll try the one I'm
currently using that is just 4 operators: two additions, a 32-bit multiply,
and an XOR.  Another advantage of Catena/NoelKDF would be very high speed.
 Another minor gripe I have about Catena is that it has zero room for
parallelism, other than what the hash function has, and the completely
random memory access will cause runtime to be dominated by cache miss
penalties.  By hashing "pages" of data together rather than completely
random locations, we eliminate most of the cache-miss penalty, and by
hashing random-ish "sequences" of sequential values within those pages, we
have a flexible way of tuning available parallelism to the application.

I don't want to sound down on Catena.  I would rave about the positives I
saw in that paper, but I've gone on too long already, and my opinion really
doesn't count much anyway.  I love the client/server split of
responsibility for the hash, and just about everything about Catena.  Like
I said, it's very nice work.  Hopefully my attack is BS, or we can fix
Catena with a minor change.


Content of type "text/html" skipped

Powered by blists - more mailing lists