lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 3 Jul 2015 05:49:55 -0700
From: Bill Cox <waywardgeek@...il.com>
To: "discussions@...sword-hashing.net" <discussions@...sword-hashing.net>
Subject: Re: [PHC] Password hashing as a self-overwriting Turing machine: new
 version (July 3)

On Thu, Jul 2, 2015 at 11:10 PM, denis bider <pwhashing@...isbider.com>
wrote:

> I have addressed reported shortcomings in BusyBeaver. Most of them (if not
> all?) have been pointed out by Bill Cox.
>
> The new latest version is here:
>
> http://www.denisbider.com/BusyBeaver-20150703.zip
>

Wow, that upgrade was fast!  Nice work.


>
> Changes:
>
> (1) A *parallel implementation*, ParaBeaver, is now implemented
> separately in ParaBeaver.h/.cpp. It uses C++11 threading in order to run a
> number of concurrent threads processing BusyBeaver instances. ParaBeaver is
> useful on clients that can expect to have most cores idle, and can benefit
> from parallelization to gain security at no cost to the user. ParaBeaver
> works by allocating room for N number of digests, which are computed in
> parallel based on N number of variations on original salt and password
> inputs. When all N digests have been computed, a hash of the concatenated
> digests forms the final result. In the special case when N=1, the final
> result equals the result of the one BusyBeaver instance. This is a very
> general parallelization construct that would work well with practically any
> similar serial algorithm.
>

This is close, but now you've given an attacker a free time-memory
trade-off.  He can compute each of the N hashes in parallel, or
sequentially.  To mostly defeat that, data should be copied between
instances at some point.  What I did in TwoCats was sync threads at 1/4
memory filling intervals, and allowed threads to access any data from any
thread from a prior filled interval.  IIRC, Yescrypt and Lyra2 have
something slightly simpler, but good enough.


> (2) An additional parameter is now supported, swapBlockThreshold, which
> provides a *hybrid mode tradeoff* between side channel resistance and
> preventing an attacker from rearranging the order in which instructions are
> executed so as to better fit into the attacker's cache. BusyBeaver will now
> operate in side-channel-free mode the first 1/4 of the time, and then
> switch to block swapping. During block swapping, the roles of SaltBlock and
> PwBlock are exchanged at every jump, so that both blocks begin to control
> jumps and offsets, and both blocks are read from and written to. ParaBeaver
> supports swapBlockThreshold, and computes it sensibly for use in
> multithreaded execution.
>

Nice.  I like it.


> (3) The BusyBeaver class is now templated to allow use of an *arbitrary
> hash* and HMAC. SHA-512 continues to be the default and suggested hash.
>

Very good.


> (4) The implementation of *MulModOp64* now uses a right shift by 4 bits
> between multiplication and modulo in order to use a portion of
> multiplication result that preserves more entropy. The result of MulModOp64
> is now used via rotBits in all instructions, to prevent the CPU from
> detecting it's not used and optimizing it away.
>

That should work.  It also makes your algorithm more ASIC attack
compute-time resistant since you're building chains of multiply operations.


> I believe this addresses all raised concerns, while preserving the
> simplicity of the algorithm. Please let me know if I have missed an
> important concern!
>

 Well... not _all_ my concerns.  I would prefer that BusyBeaver_BlockSize
be a parameter to the algorithm rather than a constant.  That would make it
more suitable for both server and client defense, without adding
significant complexity.

With these changes, I think your not losing any entropy, since every
operation is reversible.  That said, now you have a minor additional
problem: the computation prior to the final hash ran be run entirely
backwards.  In most of the PHC entries, this could be used in some cases to
benefit an attacker in a TMTO attack, since it gives him more ways to
recompute values that are missing from memory.  In the case of BusyBeaver,
you overwrite memory so many times, I doubt there's much reason to be
concerned about TMTO attacks.

However, like Bcrypt, BusyBeaver in it's current form uses very little
memory (it fits in L2 cache), and overwrites it many times.  This is good
for GPU defense, and this makes a nice server-side algorithm.  However, for
full-disk encryption or unwrapping user keys, I'd prefer to use _lots_ of
memory - maybe a GiB.  In this case, the memory wont be overwritten many
times like it is server-side.  Your algorithm would then become more
sensitive to TMTO attacks, as it should.  There's no point lowering
memory*time defense much due to paranoia about TMTO attacks.  Lyra2 suffers
a bit in this regard, making it a bit slower than Yescrypt and Argon2d.  A
reasonable defense is to introduce a cryptographic hash now and then into
your computations, which force an attacker to move compute forward.  For
example, you could compute hashState = SHA512(hashState || registerValue)
every 10000 or so loop iterations, and do registerValue ^=
lower64(hashState).

However, this is really a minor concern, IMO, so feel free to leave it out
;-)

Bill

Content of type "text/html" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ