[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9qBmugSm+o5u4pq@a4bf019067fa.jf.intel.com>
Date: Wed, 1 Feb 2023 07:13:30 -0800
From: Ashok Raj <ashok.raj@...el.com>
To: Borislav Petkov <bp@...en8.de>
CC: "Luck, Tony" <tony.luck@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>, x86 <x86@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
"Hansen, Dave" <dave.hansen@...el.com>,
"Schofield, Alison" <alison.schofield@...el.com>,
"Chatre, Reinette" <reinette.chatre@...el.com>,
Tom Lendacky <thomas.lendacky@....com>,
"Stefan Talpalaru" <stefantalpalaru@...oo.com>,
David Woodhouse <dwmw2@...radead.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Jonathan Corbet <corbet@....net>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Peter Zilstra <peterz@...radead.org>,
"Lutomirski, Andy" <luto@...nel.org>,
"andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
"Ostrovsky, Boris" <boris.ostrovsky@...cle.com>,
Martin Pohlack <mpohlack@...zon.de>,
Ashok Raj <ashok.raj@...el.com>
Subject: Re: [Patch v3 Part2 3/9] x86/microcode/intel: Fix collect_cpu_info()
to reflect current microcode
On Wed, Feb 01, 2023 at 01:53:32PM +0100, Borislav Petkov wrote:
> On Tue, Jan 31, 2023 at 10:43:23PM +0000, Luck, Tony wrote:
> > In an ideal world yes. But what if T1 arrives here and tries to do the
> > update while T0, which has returned out of the microcode update
> > code and could be doing anything, happen to be doing WRMSR(some MSR
> > that the ucode update is tinkering with).
> >
> > Now T0 explodes (not literally, I hope!) but does something crazy because
> > it was in the middle of some microcode flow that got updated between two
> > operations.
>
> So first of all, I'm wondering whether the scenario you're chasing is
> something completely hypothetical or you're actually thinking of
> something concrete which has actually happened or there's high potential
> for it.
>
> In that case, that late patching sync algorithm would need to be made
> more robust to handle cases like that.
That's correct. But fundamentally we sent the sibling down the
apply_microcode() path just to make sure the per-thread info is updated.
It appears the code is using a side effect that the revision got updated
even though we don't actually intend to perform a wrmsr on the sibling
in the normal case that primary completes the update.
If the purpose is only to update the revision, using the collect_cpu_info()
which seems more appropriate for that purpose, and doesn't have any
implied issues with using a wrmsr flow. It's not broken today, but the code
isn't future proof. Calling the revision update only keeps those questions
at bay.
I think this is what Thomas implied to cleanup in his comments.
>
> Because from what I'm reading above, this doesn't sound like the
> reporting is wrong only but more like, if T0 fails the update and T1
> gets to do that update for a change, then crap can happen.
>
> Which means, our update dance cannot handle that case properly.
>
It doesn't need to if we don't do an apply_microcode() for the sibling.
Cheers,
Ashok
Powered by blists - more mailing lists