linux-kernel - Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180228135931.uwveegfdv5afozxe@khazad-dum.debian.net>
Date:   Wed, 28 Feb 2018 10:59:31 -0300
From:   Henrique de Moraes Holschuh <hmh@....eng.br>
To:     Borislav Petkov <bp@...en8.de>
Cc:     X86 ML <x86@...nel.org>,
        Arjan Van De Ven <arjan.van.de.ven@...el.com>,
        Ashok Raj <ashok.raj@...el.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

On Wed, 28 Feb 2018, Borislav Petkov wrote:
> + * Late loading dance. Why the heavy-handed stomp_machine effort?
> + *
> + * - HT siblings must be idle and not execute other code while the other sibling
> + *   is loading microcode in order to avoid any negative interactions caused by
> + *   the loading.
> + *
> + * - In addition, microcode update on the cores must be serialized until this
> + *   requirement can be relaxed in the future. Right now, this is conservative
> + *   and good.

Eek! If I read that right, this effectively halts the entire box until
every core is updated, with one core entering deep-coma at a time (the
rest are left either spinning or cpu_relax()ing depending on whether
they have already updated or not)?

If this is correct, I shudder at what it would do on a server with
dozens, or hundreds of cores...  According to Ben Hawkes' paper, Intel's
on-die microcode update loader takes linear time relative to the update
size to do the crypto dance.

On my single-xeon X5550 workstation, which should be relatively fast
since its microcode update is small, the whole thing would take about
3,2 million cycles (circa 800k cycles per core, 4 cores, skipping
hyperthreads) to do a sync late update.  I don't believe this has
changed much, but I *did not* test, e.g., a Skylake Xeon, or anything
newer than that Xeon X5550.

Anyway, maybe there is a safe way to do it in a more parallel fashion
based on cpu topology?

AFAIK, it is not like there is any way to make OS microcode updates
(early or late) safe against SMIs and NMIs hitting the sibling
hyperthread while updating the other, so we don't have to care about
*that* nasty corner case simply because we can't avoid it in the first
place.

Hopefully AMD has none of those pitfalls, and could just trigger an
update on half the cores at a time, easily bounding it to approximately
twice the time it takes to update a single core :-(

-- 
  Henrique Holschuh