phc-discussions - Re: [PHC] Password hashing as a self-overwriting Turing machine: new version (July 3)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1918386586-2928@skroderider.denisbider.com>
Date: Fri, 3 Jul 2015 08:15:42 +0100
From: denis bider <pwhashing@...isbider.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Password hashing as a self-overwriting Turing machine: new
 version (July 3)

> The result of MulModOp64 is now used via  rotBits in all instructions, to prevent the CPU from detecting it's not used and  optimizing it away.
 
Scratch that; MulModOp64 is composed of two underlying 32 -> 32 instructions, and rotBits only made use of the result of one of them.

Instead of tying the result of MulModOp64 via rotBits, it's now used to add to an internal register, which is pushed out on a regular basis via the fourth instruction.

I have updated the published archive:

http://www.denisbider.com/BusyBeaver-20150703.zip 
denis

   -----Original Message-----  From: denis bider  Sent: Friday, July 3, 2015 00:10  To: discussions@...sword-hashing.net  Subject: RE: [PHC] Password hashing as a self-overwriting Turing machine:  new version (July 3)    I have addressed reported shortcomings in BusyBeaver. Most of them (if not  all?) have been pointed out by Bill Cox.   The new latest version is here:   http://www.denisbider.com/BusyBeaver-20150703.zip   Changes:   (1) A parallel implementation, ParaBeaver, is now implemented separately in  ParaBeaver.h/.cpp. It uses C++11 threading in order to run a number of  concurrent threads processing BusyBeaver instances. ParaBeaver is useful on  clients that can expect to have most cores idle, and can benefit from  parallelization to gain security at no cost to the user. ParaBeaver works by  allocating room for N number of digests, which are computed in parallel based on  N number of variations on original salt and password inputs. When all N digests  have been computed, a hash of the concatenated digests forms the final result.  In the special case when N=1, the final result equals the result of the one  BusyBeaver instance. This is a very general parallelization construct that would  work well with practically any similar serial algorithm.   (2) An additional parameter is now supported, swapBlockThreshold, which  provides a hybrid mode tradeoff between side channel resistance and preventing  an attacker from rearranging the order in which instructions are executed so as  to better fit into the attacker's cache. BusyBeaver will now operate in  side-channel-free mode the first 1/4 of the time, and then switch to block  swapping. During block swapping, the roles of SaltBlock and PwBlock are  exchanged at every jump, so that both blocks begin to control jumps and offsets,  and both blocks are read from and written to. ParaBeaver supports  swapBlockThreshold, and computes it sensibly for use in multithreaded  execution.   (3) The BusyBeaver class is now templated to allow use of an arbitrary hash  and HMAC. SHA-512 continues to be the default and suggested hash.   (4) The implementation of MulModOp64 now uses a right shift by 4 bits  between multiplication and modulo in order to use a portion of multiplication  result that preserves more entropy. The result of MulModOp64 is now used via  rotBits in all instructions, to prevent the CPU from detecting it's not used and  optimizing it away.   I believe this addresses all raised concerns, while preserving the  simplicity of the algorithm. Please let me know if I have missed an important  concern!     In response to Marsh Ray:   > I don’t think you get to invoke the hardness of the halting problem  when your computation is known to complete in a fixed number of steps.   I'm not formally invoking the hardness of the halting problem. Instead, I'm  hypothesizing that there are reasons why this construction is not reducible, and  that these reasons are similar to that problem.   In intuitive terms, I believe this is because one cannot know what the  algorithm is going to do until the algorithm does it. Any attempt to forecast  what the algorithm will do is simulation, and simulation is as expensive as  execution.   Non-reducibility matters for an attacker who does not know the password.  Therefore, the attacker does not know the contents of the password block.   As of today, the algorithm has two phases:   - During the first phase, memory addresses are controlled by the salt  block, and the attacker knows where memory access is going to happen. Their  ability to reorder operations is limited in this phase by random  (salt-determined) memory access conflicts. For each address, about every 500th  operation is expected to read from it, and about every 1,000th operation is  expected to write to it. Each read/write also involves neighboring addresses,  and is performed at unaligned boundaries. What the attacker can do in this phase  is reorder operations to optimize how the memory access fits into their cache.  They still have to determine what operations to execute, and actually execute  them, separately based on each guess of the password.   - During the second phase, memory addresses are controlled by both blocks,  and both the salt block and the password block are read from and written to. At  this point, the attacker quickly loses all idea of where memory access is going  to happen, and can no longer even try to optimize for cache locality.
Content of type "text/html" skipped