lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 4 Jan 2014 10:31:42 +0400 From: Solar Designer <solar@...nwall.com> To: discussions@...sword-hashing.net Subject: Re: [PHC] Reworked KDF available on github for feedback: NOELKDF On Fri, Jan 03, 2014 at 03:12:40PM -0500, Bill Cox wrote: > The code is at: > > https://github.com/waywardgeek/noelkdf Two more comments on it: This appears to select the random page index based on the first uint64_t of a page: // Select a random from page fromPage = mem + PAGE_LENGTH*(*prevPage % i); and you appear to be computing uint64_t's of a page sequentially, in increasing order. Thus, the next random page index becomes known almost as soon as you've started processing a page. This may be intentional (e.g., EARWORM deliberately allows for one-ahead prefetch, but it targets memory bandwidth and doesn't try to be sequential memory-hard), but probably it is not (it provides extra parallelism and allows for much higher latency memory to be used efficiently, which you're not making use of - at least not yet - so it benefits attackers). scrypt uses the last (not the first) element of a block to determine the random index. PAGE_LENGTH of 16 KB is probably too large for currently common CPUs, considering that you're working with 3 such pages at once (prev, from, to), you'd optimally run 2 threads/core on many current CPUs, and the CPUs have only 32 KB of L1 data cache per core. I think you need to set PAGE_LENGTH to 4 KB, which means that you'd be using 24 KB of L1 data cache for the pages (and some of the rest for other temporary data). If you make the from page loads non-temporal, you might be able to increase PAGE_LENGTH to 8 KB and use the full 32 KB in this way (with a little bit of cache thrashing because of other temporary data). The stores should continue to go to cache+memory, because you're reading from prev page (so you need it cached) and the next iteration will similarly read from the current page (so you need the current stores to be cached, too). A further optimization may then be to start using the non-temporal hint only once a size threshold is exceeded (e.g., once the amount of data written exceeds L3 cache size times a coefficient to be tuned). All of this assumes sufficient L1 data cache associativity, which is generally the case on current x86 CPUs. Alexander
Powered by blists - more mailing lists