[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9mDYMASXCFaFkNU@zn.tnic>
Date: Tue, 31 Jan 2023 22:08:48 +0100
From: Borislav Petkov <bp@...en8.de>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: "Raj, Ashok" <ashok.raj@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>, x86 <x86@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
"Hansen, Dave" <dave.hansen@...el.com>,
"Schofield, Alison" <alison.schofield@...el.com>,
"Chatre, Reinette" <reinette.chatre@...el.com>,
Tom Lendacky <thomas.lendacky@....com>,
Stefan Talpalaru <stefantalpalaru@...oo.com>,
David Woodhouse <dwmw2@...radead.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Jonathan Corbet <corbet@....net>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Peter Zilstra <peterz@...radead.org>,
"Lutomirski, Andy" <luto@...nel.org>,
"andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
"Ostrovsky, Boris" <boris.ostrovsky@...cle.com>,
Martin Pohlack <mpohlack@...zon.de>
Subject: Re: [Patch v3 Part2 3/9] x86/microcode/intel: Fix collect_cpu_info()
to reflect current microcode
On Tue, Jan 31, 2023 at 08:49:52PM +0000, Luck, Tony wrote:
> What happens here if the update on the first hyperthread failed (sure, it shouldn't,
> but stuff happens at large scale). In this case the current rev is still older that the
> the cache version ... so there is no "goto out", and this hyperthread will now write
> the MSR to initiate microcode update here, while the first thread is off executing
> arbitrary code (the situation that we want to avoid).
Lemme see if I can follow: we sync all threads in __reload_late() and
once they all arrive, we send them down into ->apply_microcode.
T0 arrives, and fails the update. That is this piece:
/* write microcode via MSR 0x79 */
wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);
rev = intel_get_microcode_revision();
if (rev != mc->hdr.rev) {
pr_err("CPU%d update to revision 0x%x failed\n",
cpu, mc->hdr.rev);
return UCODE_ERROR;
}
We return here without updating cpu_sig.rev, as we should.
T1 arrives, updates successfully and updates its cpu_sig.rev.
T0's patch level has been updated too with that because the microcode
engine is shared between the threads. T0's cpu_sig.rev isn't, however,
as that has happened "behind its back", so to speak.
Is that the scenario you're talking about?
If so, if you look at __reload_late(), it'll say
pr_warn("Error reloading microcode on CPU %d\n", cpu);
and the large scale operator will know.
And well, the easy fix is, do the reload again. :-)
That'll update the cached values too.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists