phc-discussions - GPU vs CPU benchmarks for Makwa

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150518223903.GA16657@bolet.org>
Date: Tue, 19 May 2015 00:39:03 +0200
From: Thomas Pornin <pornin@...et.org>
To: discussions@...sword-hashing.net
Subject: GPU vs CPU benchmarks for Makwa

Hello,

I have made some OpenCL implementations for Makwa (actually, for the
modular squarings where Makwa spends most of its time), optimized for a
Radeon HD 7990 GPU ("Tahiti" devices). I also compared the resulting
performance with what can be achieved with an Intel i7 4770K (Haswell
core, implementation uses AVX2 opcodes).

The comparison report is there:
   http://www.bolet.org/makwa/makwa-gpu-20150518.pdf

The OpenCL code can be downloaded here:
   http://www.bolet.org/makwa/Makwa-OpenCL-20150518.tar.gz

The test machines are from the Openwall HPC Village, an access to which
having been generously provided by Solar Designer. Let me express my
deep thanks for making such a resource available for free to random
developers like myself.

Report highlights:

 -- I get 31.8 millions of modular squarings per second on the GPU.

 -- On the CPU, I can do 5.45 millions of modular squarings per second.

 -- When taking into account hardware cost and energy consumption (which
 is not as easy as it seems), the GPU turns out to be 1.72 times more
 efficient than the CPU.

 -- There are reasons to believe (or hope, depending on point of view)
 that the current slight(*) GPU advantage will not last (especially
 because of AVX-512).

 -- Increasing the modulus size to 4096 bits should again put the CPU in
 the lead. Makwa, as specified today, supports arbitrary modulus sizes
 (minimum is 1280 bits, there is no formal maximum, but reference
 implementations can go at least to 32768 bits). The _recommended
 default_ is currently 2048 bits; in the light of these benchmarks, I
 might raise that recommendation to 4096 bits (subject to obtention of
 more data points).

(*) I use the term "slight" because the GPU is less than twice as
efficient as the CPU. Within the report, I claim them to be "on par" for
the same reason. Of course, a 1.72x factor is not negligible from a
business point of view; but for security, this is a low factor. Some
investment in user education in choosing passwords (or, better yet,
using a "password safe") is likely to yield much greater improvements.

	--Thomas Pornin