lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 20 Jan 2014 08:38:50 -0500
From: Bill Cox <waywardgeek@...il.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Native server relief support for password hashing in browsers

On Sun, Jan 19, 2014 at 9:41 PM, Steve Thomas <steve@...tu.com> wrote:
>> On January 19, 2014 at 6:47 AM Bill Cox <waywardgeek@...il.com> wrote:
>> I would hate to see entries like Blakerypt bow out because of Catena.
>> If I drop out, it will be because of excellent real-world work like
>> escrypt. Catena has some excellent theoretical work, but until I see
>> an implementation that begins to approach the security of Scrypt, I
>> don't see it as a real contender.
>
> So I guess you've seen this ( https://github.com/cforler/catena) and
> benchmarked both of them. What were your results? I've been meaning
> to do this for a while. Along with others.

Thanks for the link!  I don't know why I had not seen it before.  I
just ran catena-blake2b-opt-test on my development machine, and I got
this:

runtime: 0.296 seconds
garlic: 18
lambda: 3
memory: 64*2^18 = 16MB
memory access: 5 reads, 4 writes: 9*16MB = 144MB
bandwidth: 487 MB/s

I could have some of these numbers wrong... blake2 runs a bit faster
than SHA512, so I only benchmarked blake2b.

For comparison, the default build of scrypt on the same machine
(running 64-bit arch linux) fills 500MB in 1.8 seconds, doing 1 read
and 1 write.  If fills 5X more memory in the same time, though it's
memory bandwidth is only 16% higher.

The main difference between the two seems to be that Catena does 9
read/write per memory location, while Scrypt does 2.

>> The Catena paper blew me away and I stole some of their ideas
>> wholesale. However, it seems like an academic solution at this stage.
>> It has nothing to support modern memory architectures, no useful
>> parallelism, and no high performance hashing. Of course, all of that
>> can be added, but until I see a real-world solution, it remains
>> theoretical work in my mind.
>
> Catena uses Blake2b which has parallelism in it. SHA512 was added
> because there was a PHP implementation that used SHA512. SHA512 is
> built into PHP and will be faster than Blake2b written in PHP.

The implicit parallelism in Blake2b and other hashing algorithms is a
bad thing for memory-hard KDFs, IMO.  Any parallelism that our CPUs
can't take advantage of is an advantage to the attacker.  By "useful
parallelism", I mean a way to spread a hashing algorithm across many
CPUs so they can work in parallel.  Ideally, we match the parallelism
to the machine doing the hashing, whether it's a single-core ARM
processor, or a 4096-core GPU.  It's nice to have that built into the
hashing algorithm, IMO, though it's a minor issue.  You can always
wrap the hashing algorithm with an outer loop to get the parallelism,
but then you give an attacker a free time-memory tradeoff.

IMO, this implementation of Catena provides questionable security
compared to Scrypt, while escrypt seems to take security to a new
level.  The anti-timing attack is super-cool, but is it worth dropping
memory usage 5X?  It seems to me that there are 3 issues holding
Catena's performance back, but all of them could be fixed:

- Cantena-3 only uses 1/4 of the memory it computes.  Why not have a
final Catena row that hashes data from all 4 previous rows?
- small hashing size (64 bytes) means cache misses will dominate for
memory >> cache size.  Why not hash 1024 bytes at a time like Scrypt?
- Even blake2b is too slow for this competition, though I seem to
remain in the minority in feeling this way.  What's wrong with r[i] =
r[i-1]*(prevRow[reverse(i)] | 1) + 12345?

Bill

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ