lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Jan 2012 19:35:19 -0800
From:	Linus Torvalds <>
To:	Herbert Xu <>
Cc:	"David S. Miller" <>,
	Linux Kernel Mailing List <>,
	Linux Crypto Mailing List <>
Subject: Re: Crypto Fixes for 3.3

On Wed, Jan 25, 2012 at 6:43 PM, Herbert Xu <> wrote:
> This push fixes a race condition in sha512 that affects users
> who use it in process context and softirq context concurrently,
> in particular, this affects IPsec.  The result of the race is
> the production of incorrect hashes, which for IPsec leands to
> loss of connectivity.

Ugh. This once more has the crazy signed integer modulus operator,
which can be quite expensive depending on whether the compiler can
tell whether it is always positive or not.

Also, that modulus is exposed everywhere.

In git, the sha1 implementation (which has many of the same issues) does this:

  /* This "rolls" over the 512-bit array */
  #define W(x) (array[(x)&15])

which means that the modulus exists in just one place (and is the
correct binary 'and', not the possibly-expensive division).

We also avoid the problem with absolutely horrible gcc register usage
by having an arch-specific "accessor macro":

   * If you have 32 registers or more, the compiler can (and should)
   * try to change the array[] accesses into registers. However, on
   * machines with less than ~25 registers, that won't really work,
   * and at least gcc will make an unholy mess of it.
   * So to avoid that mess which just slows things down, we force
   * the stores to memory to actually happen (we might be better off
   * with a 'W(t)=(val);asm("":"+m" (W(t))' there instead, as
   * suggested by Artur Skawina - that will also make gcc unable to
   * try to do the silly "optimize away loads" part because it won't
   * see what the value will be).
   * Ben Herrenschmidt reports that on PPC, the C version comes close
   * to the optimized asm with this (ie on PPC you don't want that
   * 'volatile', since there are lots of registers).
   * On ARM we get the best code generation by forcing a full memory barrier
   * between each SHA_ROUND, otherwise gcc happily get wild with spilling and
   * the stack frame size simply explode and performance goes down the drain.

  #if defined(__i386__) || defined(__x86_64__)
    #define setW(x, val) (*(volatile unsigned int *)&W(x) = (val))
  #elif defined(__GNUC__) && defined(__arm__)
    #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0)
    #define setW(x, val) (W(x) = (val))

which is not pretty, but as you guys found out, the alternative can be
much worse (ie totally crazy gcc register spilling)

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists