linux-kernel - Re: x86, microcode: BUG: microcode update that changes x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140924145658.GB31678@khazad-dum.debian.net>
Date:	Wed, 24 Sep 2014 11:56:58 -0300
From:	Henrique de Moraes Holschuh <hmh@....eng.br>
To:	Borislav Petkov <bp@...en8.de>
Cc:	Chuck Ebbert <cebbert.lkml@...il.com>,
	Andy Lutomirski <luto@...capital.net>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: x86, microcode: BUG: microcode update that changes x86_capability

On Tue, 23 Sep 2014, Borislav Petkov wrote:
> On Fri, Sep 19, 2014 at 01:42:17PM -0300, Henrique de Moraes Holschuh wrote:
> > 1. offline a "guinea pig" group of "cpus", i.e. an entire "microcode update
> > unit" that doesn't include the BSP.  This is going to be a pain, as what
> > composes a "microcode update unit" is not set in stone, and could change in
> > a future microarch.
> 
> I'm pretty sure it is very dangerous to run with different microcode
> revisions on different cores. Your plan won't fly and I have hard time
> understanding why one would do such thing even if it did work.

I don't want that plan to fly, it is too complex and I wrote as much at
the end of that email.  I won't bother with the situations where it would
be helpful, they're not very interesting.

On the topic of microcode revision skew in a multi-processor system: 

For a long time we had an Extremely Bad userspace interface that required
userspace to trigger the microcode update once per cpu, and it fetched the
microcode from userspace once per cpu.

This made for an absurdly large time window during which we'd have
microcode revision skew across cpus, and yet nothing blew up sky-high.  If
microcode revision skew was not generally safe, we'd have had a lot of
trouble already.

In fact, we still run the system with microcode revision skew while the
microcode update is taking place through the regular microcode driver, as
it is serialized one cpu at a time, and the other cpus are active and
running.

I don't know about AMD, but on Intel, the time it takes to update the
microcode on a core is anything but negligible[1], so the microcode
version skew window still exists, and it is not small.  It is much smaller
than it once was, but it is still there.

The only way to really minimize the risk of microcode version skew is to
limit oneself to firmware and early initramfs microcode updates.

> If we're going to have to hide stuff which software might be using, I
> don't see a way around rebooting.

Nor do I.

But IMHO we still need to detect and do something smart when
x86_capability changes due to a microcode update.

And I'd really prefer it to be "update x86_capability, warn the user and
carry on" for anything that is not going to crash the kernel.  Several
distros will really want this backported to -stable, as the older kernels
cannot do early microcode updates.

[1] Intel processors take from 200 thousand cycles to several million
    cycles per core to sucessfully apply a microcode update.  Verified
    using get_cycles() right before and right after the WRMSR 0x79.
    Variance was really high, about 10%.  My limited testing matched what
    has been previously reported by Ben Hawkes.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/