lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 17 Oct 2015 23:01:17 +0300
From: Solar Designer <>
Subject: Re: [PHC] Argon2 PHC release...

On Sat, Oct 17, 2015 at 08:55:27AM -0700, Bill Cox wrote:
> I prefer Argon2id over Argon2d.  Is there any reason to use 2d instead of
> 2id?

I think we should include 2i and 2ds.  I wouldn't mind 2i and 2ids, if
there's demand and that flavor is introduced.  2d and 2id are probably
not worth it: they don't take full advantage of data dependent access,
yet also don't provide the peace of mind that 2i does.

Also, I just started looking at 2ds, and I think it might need a little
bit more work:

Do I understand it right that right now the S-boxes are shared for all
lanes (when there's more than one)?  That's not great if so.  They
should be per lane (they are in yescrypt).  OTOH, it isn't a fatal
drawback either, because for password hashing use on current CPUs we'll
typically have lanes=threads=1.  (We may need more on many-core devices
like Xeon Phi, and on future CPUs, though.)

The source code comment in argon2.h says that 2ds is "20% slower" than
2d.  I think it's a property of the current implementations, which keep
the MAXFORM loop in a separate "for" loop, and most importantly within
an "if".  I think this "if" doesn't let a compiler mix that loop's body
with the following code even when the loop is unrolled due to compiler
optimization flags.  Thus, we get the sum of two latencies (for MAXFORM
and for Argon2's block compression function) rather than max of the two.
(I am simplifying, as scalar and SIMD aren't entirely independent: they
compete for instruction decode, etc.  So slight slowdown is expected,
but 20% is worse than expected.)  To optimize this, we may try declaring
an inline FillBlock_body() function, and calling it twice from a wrapper
FillBlock() function, passing a Boolean constant for the "if".  The
intent would be to tell the compiler to specialize the code for the two
cases.  We can do this post-PHC, unless this 20% thing determines our
(non-)inclusion of 2ds in a PHC-official implementation.

Right now, 2ds doesn't appear to be included in the would-be PHC release
that JP et al. started work on (BTW, thanks for the effort!)  I think
that ideally we'd include it instead of 2d, and make its S-boxes per-lane.


Powered by blists - more mailing lists