lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180228135931.uwveegfdv5afozxe@khazad-dum.debian.net>
Date:   Wed, 28 Feb 2018 10:59:31 -0300
From:   Henrique de Moraes Holschuh <hmh@....eng.br>
To:     Borislav Petkov <bp@...en8.de>
Cc:     X86 ML <x86@...nel.org>,
        Arjan Van De Ven <arjan.van.de.ven@...el.com>,
        Ashok Raj <ashok.raj@...el.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 7/7] x86/microcode: Synchronize late microcode loading

On Wed, 28 Feb 2018, Borislav Petkov wrote:
> + * Late loading dance. Why the heavy-handed stomp_machine effort?
> + *
> + * - HT siblings must be idle and not execute other code while the other sibling
> + *   is loading microcode in order to avoid any negative interactions caused by
> + *   the loading.
> + *
> + * - In addition, microcode update on the cores must be serialized until this
> + *   requirement can be relaxed in the future. Right now, this is conservative
> + *   and good.

Eek! If I read that right, this effectively halts the entire box until
every core is updated, with one core entering deep-coma at a time (the
rest are left either spinning or cpu_relax()ing depending on whether
they have already updated or not)?

If this is correct, I shudder at what it would do on a server with
dozens, or hundreds of cores...  According to Ben Hawkes' paper, Intel's
on-die microcode update loader takes linear time relative to the update
size to do the crypto dance.

On my single-xeon X5550 workstation, which should be relatively fast
since its microcode update is small, the whole thing would take about
3,2 million cycles (circa 800k cycles per core, 4 cores, skipping
hyperthreads) to do a sync late update.  I don't believe this has
changed much, but I *did not* test, e.g., a Skylake Xeon, or anything
newer than that Xeon X5550.

Anyway, maybe there is a safe way to do it in a more parallel fashion
based on cpu topology?

AFAIK, it is not like there is any way to make OS microcode updates
(early or late) safe against SMIs and NMIs hitting the sibling
hyperthread while updating the other, so we don't have to care about
*that* nasty corner case simply because we can't avoid it in the first
place.

Hopefully AMD has none of those pitfalls, and could just trigger an
update on half the cores at a time, easily bounding it to approximately
twice the time it takes to update a single core :-(

-- 
  Henrique Holschuh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ