lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 19 May 2015 00:39:03 +0200 From: Thomas Pornin <pornin@...et.org> To: discussions@...sword-hashing.net Subject: GPU vs CPU benchmarks for Makwa Hello, I have made some OpenCL implementations for Makwa (actually, for the modular squarings where Makwa spends most of its time), optimized for a Radeon HD 7990 GPU ("Tahiti" devices). I also compared the resulting performance with what can be achieved with an Intel i7 4770K (Haswell core, implementation uses AVX2 opcodes). The comparison report is there: http://www.bolet.org/makwa/makwa-gpu-20150518.pdf The OpenCL code can be downloaded here: http://www.bolet.org/makwa/Makwa-OpenCL-20150518.tar.gz The test machines are from the Openwall HPC Village, an access to which having been generously provided by Solar Designer. Let me express my deep thanks for making such a resource available for free to random developers like myself. Report highlights: -- I get 31.8 millions of modular squarings per second on the GPU. -- On the CPU, I can do 5.45 millions of modular squarings per second. -- When taking into account hardware cost and energy consumption (which is not as easy as it seems), the GPU turns out to be 1.72 times more efficient than the CPU. -- There are reasons to believe (or hope, depending on point of view) that the current slight(*) GPU advantage will not last (especially because of AVX-512). -- Increasing the modulus size to 4096 bits should again put the CPU in the lead. Makwa, as specified today, supports arbitrary modulus sizes (minimum is 1280 bits, there is no formal maximum, but reference implementations can go at least to 32768 bits). The _recommended default_ is currently 2048 bits; in the light of these benchmarks, I might raise that recommendation to 4096 bits (subject to obtention of more data points). (*) I use the term "slight" because the GPU is less than twice as efficient as the CPU. Within the report, I claim them to be "on par" for the same reason. Of course, a 1.72x factor is not negligible from a business point of view; but for security, this is a low factor. Some investment in user education in choosing passwords (or, better yet, using a "password safe") is likely to yield much greater improvements. --Thomas Pornin
Powered by blists - more mailing lists