lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 21 Apr 2015 14:51:26 +0200
From: Thomas Pornin <pornin@...et.org>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Argon2 modulo division

On Tue, Apr 21, 2015 at 06:26:31AM +0300, Solar Designer wrote:
> It appears that Argon2 uses modulo division with arbitrary (and
> changing) divisors (usually not powers of 2).  Argon2d applies this to
> secret-dependent integers.  This is an extra source of timing leaks, on
> top of secret-dependent lookup addresses.
> 
> Do we consider this a drawback?

Generally, small integer division algorithms that are not constant-time
have a running time that depends on the length of the operands, but not
on the actual values. Thus, if you need to compute x % y while hiding
the length of x, then you can compute (x + k*y) % y  with a suitable
constant k (preferably a power of 2...). For instance, if x is a secret
integer in the 0..0xFFFF range, then set k = 2^16. This won't hide the
size of the divisor (y), but it should make execution time (and
conditional branch pattern) independent of the actual value of x. The
cost of the extra shift-and-add should be small compared to that of the
division.


I make this remark without having looked at Argon2 so I don't know
whether it applies well or not. Also, note that while some architectures
offer division opcodes (x86, PowerPC...), others do not (especially ARM
before ARMv7, and also some ARMv7 variants); the latter will use a
software implementation that will be linked in and possibly inlined by
the compiler. For architectures without a hardware divide, a
special-purpose fully-unrolled division function could be written, that
offers constant-time processing. It is possible that even on some
architectures with a hardware divide, an unrolled no-division code chunk
could offer comparable or even better performance (this depends quite a
lot on what is known at compile-time about the dividend and the
divisor).


	--Thomas Pornin

Powered by blists - more mailing lists