lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 31 Dec 2013 05:42:41 +0400 From: Solar Designer <solar@...nwall.com> To: discussions@...sword-hashing.net Subject: Re: [PHC] Initial hashing function. Feedback welcome On Sun, Dec 29, 2013 at 05:29:02PM -0500, Bill Cox wrote: > I have written an initial hacked-up version of a program that for now I'm > simply calling "keystretch". Here's some ways it compares to scrypt: Thank you for contributing your ideas! > - On my Core i7 linux box, it's over 8X faster for a given memory size, > single-threaded, no SSE ... but don't we mostly care about the performance _with_ SSE (or better), and depending on use case, also with concurrent use of all cores (for computation of one or of many separate hashes, also depending on use case and other factors)? 8x can easily be the difference between scrypt's -ref and -sse (once it's optimized like I did), although perhaps not between -nosse and -sse. I guess your benchmark is for scrypt's -nosse. In my testing (with optimized scrypt code using AVX or better) reducing Salsa20 rounds count from 8 to 2 (4x reduction in round count) results in less than 2x speedup, which suggests that replacing Salsa20/N with something way simpler is not going to make all that much difference either. I guess you may be seeing much more of a difference in part because you're only benchmarking a single thread. > - Memory is hashed as it's filled, rather than filling, then hashing. This sounds similar to my changes to SMix first loop: http://www.openwall.com/lists/crypt-dev/2013/11/25/3 but extended to the second loop as well, replacing both of them with one combined loop. I thought of this too, but a drawback I saw was that either I'd have to avoid random writes (which I had implemented as an optional anti-TMTO feature) or I'd have to be computing two hashes (two BlockMix'es in scrypt terms) per loop iteration (so that I write two different values: one to the newly allocated V element and another to the currently selected random element). At that point, it is unclear if there's any benefit from continuing to expand the size of V after filling time/2 elements (if the price we pay is spending twice more time per element filled). You might (or might not) have avoided this dilemma with careful design of your RNG (or whatever you call it), having it produce two distinct outputs in less than 2x time. That said, in a TMTO-friendly mode (if deliberately supported and enabled), we may in fact use this approach and not run into the same dilemma. Another drawback is the very low probability that the last few V elements written will ever be accessed. > - 2048 rounds of PBKDF2_SHA256 are used at the start to generate an > intermediate derived key. I dislike this. It's very wasteful in use cases where our total running time is very limited. This may be all running time we can afford (if at all), not leaving any time for the memory-hard portion. Additionally, when p>1 this adds implementation complexity for implementations that actually try to make use of the parallelism. In original scrypt, they don't need to bother parallelizing the PBKDF2 step, because it's very fast anyway (no iterations). When you have bumped up the iterations to a non-negligible number, PBKDF2 will now need to be parallelized. > - The derived key does not depend on the number of threads ... as long as the number of threads is not greater than a pre-specified maximum. I can claim this too. ;-) And scrypt can claim it too, although with scrypt there's a reduction in area-time from increasing p, so it better not be done unnecessarily. > - All threads hash randomly from all of memory at the same time, so you > can't run p smaller runs in series, where p is scrypt's parallelization > parameter > > Credit for hashing all of memory with each thread goes to Alexander. It's > a good idea, IMO: > > http://www.openwall.com/lists/crypt-dev/2013/11/20/1 You wanted to refer to: http://www.openwall.com/lists/crypt-dev/2013/12/19/1 I like this too. ;-) Looking at your code, you're resolving the higher area-time vs. number of thread synchronizations tradeoff differently than I did, though. I suggested having just one extra synchronization point (between the equivalent of SMix's two loops), whereas in your case the synchronization is continuous and the number of times a thread will have to wait for other threads' data to become available is not pre-specified. > Performing the typical 2048 SHA-256 rounds on the password at the start, > and then clearing the password, insures that we have security at least as > good as OpenSLL can provide today for our ssh private keys. Any additional > security is in positive territory. That's good, but for many use cases the price is too high. I see the advantage of being able to wipe the password early on, though, and not fill lots of memory with data that is quickly derived from the password (so, if leaked, may provide for quick password cracking too). Also, for password hashing bcrypt is a better baseline, and you don't reach it - as it relates to attacks with GPUs - by 2048 iterations of PBKDF2-HMAC-SHA-256. You appear to use random 64-bit lookups within a page, and that's quite similar to how bcrypt happens to defeat GPUs now (with 32-bit lookups in 4x1 KB S-boxes), so you might be lucky in that respect. Unfortunately, the same property prevents a defensive implementation of your hash from using SIMD (without 64-bit gather loads). > Having the hash result not depend on the number of threads, and also > filling memory as we hash, allows me to have a stop pattern that I look for > when decrypting. Along with salt, I can store the final hash value, which > also looks random. When decrypting, I fill memory until this stop value is > seen. This allows better deniability for the encrypted file, as both the > hash and stop value appear to be random. This is important for TrueCrypt. Nice. I think this has a drawback, though: a TrueCrypt cracker, if aware of the settings that were used (or if can guess them fairly reliably), will be able to reject wrong keys right after the KDF, without proceeding with actual decryption of the user's data and checking its checksum. > A weakness of the hashing function vs scrypt is that it is a simple > non-cryptographic hash, rather than script's Salsa20/8. This is the > primary reason it runs faster. If we do not need a strong cryptographic > hash, there is significant opportunity for improving performance. > > Is there any reason such a simple hash function should not be used? I am > particularly interested in feedback on this point. This is a tough question. I think the required properties of the hash you use on every iteration depend on how your loops are structured. Alexander
Powered by blists - more mailing lists