lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 27 Feb 2014 17:56:17 -0000
Subject: Re: [PHC] GPU multiplication speed?

> However, SHA-256, Blake2, and other hash
> functions have a lot of parallelism

Actually they don't. Well, Blake2 has some "inner" parallelism (which
helps in leveraging SIMD instruction sets like SIMD), but nothing close to
the amount of parallelism in which GPU thrive. SHA-256 is very sequential.

What works well with GPU is running several independent instances of
SHA-256 simultanesouly; this works well because each instance will operate
on its own set of registers (you'd need about 26 or 27 registers per
instance, and that fits well into GPU resources) and there is no data
dependent memory access (indeed memory accesses are so regular that the
code can run in registers _only_). Such parallelism is nice for an
attacker, because the attacker, by definition, has a lot of passwords to
try. The time taken for each individual password hashing does not matter
much to the attacker; it is the number of hashed passwords per time unit
which conditions the attacker's success rate.

> If this is correct, wouldn't an effective strategy be to have a long
> chain of sequential operations that depend on the prior result, and to
> hash a lot of memory?

A long chain of sequential operations does not make things harder for the
attacker. Using memory, on the other hand, will harm the attacker, in two
-- If each instance uses some non-negligible amount of RAM, the attacker
won't be able to maintain many simulatenous instance in its GPU RAM.
-- If the memory is "real" memory (i.e. the access pattern is
data-dependent, hence cannot be mapped to registers), then this will
prevent most of the parallelism in the GPU.

To state things in simple (simplistic ?) words, GPU are good at
parallelizing jobs which fits in registers, but bad at parallelizing jobs
which do not (i.e., which use RAM).

        --Thomas Pornin

Powered by blists - more mailing lists