lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 25 Mar 2015 12:40:15 +0300 From: Solar Designer <solar@...nwall.com> To: discussions@...sword-hashing.net Subject: Re: [PHC] Another PHC candidates "mechanical" tests (ROUND2) On Wed, Mar 25, 2015 at 09:48:06AM +0100, Milan Broz wrote: > On 03/25/2015 05:49 AM, Solar Designer wrote: > > I think you should include more info on your two test platforms. Most > > notably, the specific CPU types and clock rate. > > OK, I am trying to avoid these test to be presented as a performance tests > because with reference implementations it would be unfair. > > Anyway, there is already lscpu for both platforms in git > https://github.com/mbroz/PHCtest/blob/master/output/round2_Lenovo_X230_i5_16G/lscpu > > https://github.com/mbroz/PHCtest/blob/master/output/round2_PPC64/lscpu > (This is quite a big machine, actually running Fedora for some Red Hat > distro development I have access to) Thanks! > > The charts showing real memory usage and real time vs. the cost > > parameter values might be useful, but it would also be useful to see a > > cross-chart of real memory usage vs. real time (with the lowest time cost > > setting supported by each scheme) and with growing memory cost setting > > (this is how you'd control the real time). > > Isn't it what Figure 1 already shows? No. > Or you mean some combination? Yes. > (tcost is minimal for each candidate there, memory cost is increasing and chart > shows real memory use + real run time). Yes, but the x-axis is in different units for different candidates. I'd like a chart combining the data for y-axes from these two charts, using one of these on the new chart's x-axis and the other on y-axis. > > Lyra2 is surprisingly fast. That's nice. Your table does not give > > exact numbers for candidates meeting your requirements, though - maybe > > you can add that (for their lowest supported t_cost)? > > For the last KDF test the exact input parameters are in Table 5, times > are in log directory. For example with the minimal t_cost: > https://github.com/mbroz/PHCtest/blob/master/output/round2_Lenovo_X230_i5_16G/m_kdf/lyra2.dat > https://github.com/mbroz/PHCtest/blob/master/output/round2_Lenovo_X230_i5_16G/m_kdf/lyra2-sse.dat So it's 515 ms for the SIMD-enabled Lyra2 to reach 1 GB. Including memory (de)allocation overhead, right? > I did not want to modify default internal configurations (and I think it > should not be even needed. Give users some knobs and they'll find insecure > combination and will use it as the default... well, joking :-) This is always a concern, yes. > Why I do not use multiple threads is based on disk-encryption my use case: > > In general, FDE-enabled disk can be moved between several machines with > completely different performance and hardware configurations. > > On slow machine unlocking should take more time and it would be > probably acceptable (in some reasonable limits). > > But if it cannot be unlocked at all (cannot run parallel threads) it is > unacceptable for the user (usability fail). Security/usability tradeoff... No, this particular tradeoff does not exist. You can always run the would-be-threads sequentially (or switch between them). e.g. I've been testing yescrypt with p=41 on a machine with 8 logical CPUs, running only 8 threads at a time. In the current submitted yescrypt code, OpenMP takes care of that. Or you can simply build without multi-threading support (neither OpenMP nor anything else) and still be able to compute a p > 1 derived key, just slower as you say. The real tradeoff is that excessive parallelism weakens security: the attacker might need less memory (presumably expensive) per compute core (presumably cheap). For this reason, maybe yescrypt's default pwxform rounds count isn't best for your use case. It works better for user authentication where the parallelism comes from concurrent requests anyway. There's another reason for the 6 rounds, though - this achieves bcrypt-like rate of S-box lookups, needed to ensure bcrypt-like GPU attack resistance. > The second reason is that it can could run in a very limited environment > (a bootloader for full system encryption) where threads are not easily manageable. So you can run a thread-less build. No problem there. You just need to provide the memory... ...Speaking of which: this is where a security/usability tradeoff actually exists. If you use e.g. Lyra2 or YESCRYPT_RW and require 1 GB, you'd have a really hard time computing on a machine with less RAM than that since we're deliberately discouraging TMTO attacks (and thus legitimate uses as well). Classic scrypt and YESCRYPT_WORM solve this, at the expense of lower area-time product and TMTO-friendliness for attackers as well. (I think there is not yet a defender-friendly implementation of classic scrypt, nor YESCRYPT_WORM, for uses on lower memory machines. But it's just a matter of implementation. This can be implemented easily when the need arises.) <plug>I think yescrypt is the only entry in this competition that retained this flexibility.</plug> Unfortunately, the security impact of YESCRYPT_WORM (vs. _RW) is very significant. It's almost the same as going back to classic scrypt, with only minor tweaks added (such as the garbage collector attack resistance I mentioned earlier today). (In part, this is deliberate: to ease implementation of YESCRYPT_WORM by slightly revising existing implementations of classic scrypt.) Alexander
Powered by blists - more mailing lists