lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 25 Jun 2014 09:22:41 +0400
From: Solar Designer <>
Subject: Re: [PHC] TwoCats multiplication chain

On Sun, Jun 22, 2014 at 03:13:02AM +0100, Samuel Neves wrote:
> On 06/21/2014 10:06 PM, Solar Designer wrote:
> > An attack which I think we haven't mentioned yet is speculative
> > computation of multiple possibilities of the next MUL before a previous
> > MUL completes.
> This may have been considered before, but I couldn't find it on this list. Have you considered modular multiplication
> instead?

I had not considered it.  Thank you for bringing this up.

> It does a much better job at mixing the bits evenly (and not only from right-to-left), and should be more
> effective against the kind of attack you describe. The friendliest modulus for CPUs might be 2^n - 1. For example,
> here's x86 code for multiplication modulo 2^32-1:
>     # compute eax * edx mod 2^32-1
>     mul edx      # edx:eax = eax * edx
>     add eax, edx # edx*2^32 + eax = (edx*1 + eax) mod 2^32-1
>     adc eax, 0   # addition above may overflow
>     # handle special case eax*edx mod 2^32-1 = 2^32-1 = 0
>     # potentially unnecessary
>     # a branch could be faster here, since it's quite predictable
>     sub eax, -1  # CF set on eax != 2^32-1
>     sbb eax,  0  # always add 1 back, except on 2^32-1
> It's a little trickier with SIMD, due to the required shuffling and lack of explicit carries, but I think it could also
> be made reasonably fast.

While the code above looks fast, it is roughly twice slower than a MUL
alone - so not a good enough idea for dealing with a potential attack
that might allow for up to a 2x speedup.  The special case handling
(last sub/sbb) could probably be omitted, but the "adc eax, 0" results
in just two possible (non-)changes to eax (either there was carry or not),
which looks very susceptible to a similar kind of speculative processing
that we wanted to mitigate.

So while it could be a good idea in terms of bits mixing, I think it is
not good enough to implement with individual instructions (until a CPU
gets a single instruction like this, with same latency as MUL alone).

Thanks again,


Powered by blists - more mailing lists