lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 02 Sep 2014 21:50:31 -0400
From: Bill Cox <>
Subject: Re: [PHC] A review per day - Parallela

Hash: SHA1

The basic idea of Parallela: use your GPU for defensive purposes.

Basically, my notes were not very accurate.  First, this is primarily
an algorithm designed to make defensive use of your GPU, and maybe
FPGAs.  It runs a very large number of parallel SHA512 hashes
(3*5*128*loopCount), and then does as many of these in series as you
like.  I am a fan of defensive use of GPUs.  Against GPU farms, it
puts you basically on par with them.

The code is bigger than I'd thought, but mostly due to portability
issues.  It includes hundreds of lines of stdint.h and inttypes.h to
port between GNU and Visual C++.  On the positive side, it looks like
a lot of porting work has already been done.  On the down side, for a
primarily GPU algorithm I'd really like to see some CUDA code (or the
other one).

The main function is in a file only 150 lines long, and it was easy to
read.  Here's a simple dump of my notes:

- - Minor grip: why not use m_cost for the serial parameter?  Encoding
two time values in t_cost is a pain.
- - Deals with practical details, like zeroing out before returning an
error code.  This might help inform a user that there is something
that needs attention
- - Feeds salt into SHA512 first rather than salt.  Someone was saying
that's preferred, but I don't know why, when our crypto-hash
primitives are solid little mega-scrambling machines.
- - line 145: "// TODO: find a secure wipe function" - I use blake2's:

/* prevents compiler optimizing out memset() */
static inline void secure_zero_memory( void *v, size_t n )
  volatile uint8_t *p = ( volatile uint8_t * )v;

  while( n-- ) *p++ = 0;

- - No threads or openmp support, because it's designed for GPUs.
Where's the CUDA?  No point benchmarking yet.

I take issue with one statement in the paper:

"The attacker-defender ratio is 1. Any advancements in cracking are
advancements for the defender. If ASICs come out that can crack this
hash they more than likely can be used by the defender."

I doubt the NSA is going to sell you their ASICs!  This is OK defense
for a system with a GPU, where we worry that SHA512 hashing by the
attacker is also done with GPUs.  However, we have SHA256 BitCoin
ASICs that blow away GPUs.  SHA512 hashing is just twice the
transistors per core.  ASICs will rule here, and we still can't buy
defensive SHA256 ASICs.  There's probably some secret law about that.

There is also a paralleKDF function that is capable of outputting
variable length data.  I am not sure that there need to be two
functions here.  It seems like they could be combined into one.

That's pretty much all I found to talk about in Parallela.  The idea
is a good one.  I'm not sure I can really do a lot more.  There's
really no point in running Dieharder checks on SHA512 output, and
without CUDA code and a GPU, I can't really benchmark it, SFAIK.

Version: GnuPG v1


Powered by blists - more mailing lists