linux-kernel - Re: Crypto Fixes for 3.3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 25 Jan 2012 19:35:19 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Herbert Xu <herbert@...dor.apana.org.au>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux Crypto Mailing List <linux-crypto@...r.kernel.org>
Subject: Re: Crypto Fixes for 3.3

On Wed, Jan 25, 2012 at 6:43 PM, Herbert Xu <herbert@...dor.apana.org.au> wrote:
>
> This push fixes a race condition in sha512 that affects users
> who use it in process context and softirq context concurrently,
> in particular, this affects IPsec.  The result of the race is
> the production of incorrect hashes, which for IPsec leands to
> loss of connectivity.

Ugh. This once more has the crazy signed integer modulus operator,
which can be quite expensive depending on whether the compiler can
tell whether it is always positive or not.

Also, that modulus is exposed everywhere.

In git, the sha1 implementation (which has many of the same issues) does this:

  /* This "rolls" over the 512-bit array */
  #define W(x) (array[(x)&15])

which means that the modulus exists in just one place (and is the
correct binary 'and', not the possibly-expensive division).

We also avoid the problem with absolutely horrible gcc register usage
by having an arch-specific "accessor macro":

  /*
   * If you have 32 registers or more, the compiler can (and should)
   * try to change the array[] accesses into registers. However, on
   * machines with less than ~25 registers, that won't really work,
   * and at least gcc will make an unholy mess of it.
   *
   * So to avoid that mess which just slows things down, we force
   * the stores to memory to actually happen (we might be better off
   * with a 'W(t)=(val);asm("":"+m" (W(t))' there instead, as
   * suggested by Artur Skawina - that will also make gcc unable to
   * try to do the silly "optimize away loads" part because it won't
   * see what the value will be).
   *
   * Ben Herrenschmidt reports that on PPC, the C version comes close
   * to the optimized asm with this (ie on PPC you don't want that
   * 'volatile', since there are lots of registers).
   *
   * On ARM we get the best code generation by forcing a full memory barrier
   * between each SHA_ROUND, otherwise gcc happily get wild with spilling and
   * the stack frame size simply explode and performance goes down the drain.
   */

  #if defined(__i386__) || defined(__x86_64__)
    #define setW(x, val) (*(volatile unsigned int *)&W(x) = (val))
  #elif defined(__GNUC__) && defined(__arm__)
    #define setW(x, val) do { W(x) = (val); __asm__("":::"memory"); } while (0)
  #else
    #define setW(x, val) (W(x) = (val))
  #endif

which is not pretty, but as you guys found out, the alternative can be
much worse (ie totally crazy gcc register spilling)

                    Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/