[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y2vQRMyOndQtG/yJ@a4bf019067fa.jf.intel.com>
Date: Wed, 9 Nov 2022 08:07:32 -0800
From: Ashok Raj <ashok.raj@...el.com>
To: Borislav Petkov <bp@...en8.de>
CC: Thomas Gleixner <tglx@...utronix.de>,
LKML Mailing List <linux-kernel@...r.kernel.org>,
X86-kernel <x86@...nel.org>, Tony Luck <tony.luck@...el.com>,
Dave Hansen <dave.hansen@...el.com>,
Arjan van de Ven <arjan.van.de.ven@...el.com>,
Andy Lutomirski <luto@...nel.org>,
"Jacon Jun Pan" <jacob.jun.pan@...el.com>,
Tom Lendacky <thomas.lendacky@....com>,
"Kai Huang" <kai.huang@...el.com>,
Andrew Cooper <andrew.cooper3@...rix.com>,
"Ashok Raj" <ashok.raj@...el.com>
Subject: Re: [v2 03/13] x86/microcode/intel: Fix a hang if early loading
microcode fails
On Wed, Nov 09, 2022 at 12:25:02PM +0100, Borislav Petkov wrote:
> On Thu, Nov 03, 2022 at 05:58:51PM +0000, Ashok Raj wrote:
> > When early loading of microcode fails for any reason other than the wrong
> > family-model-stepping, Linux can get into an infinite loop retrying the
> > same failed load.
> >
> > A single retry is needed to handle any mixed stepping case.
> >
> > Assume we have a microcode that fails to load for some reason.
> > load_ucode_ap() seems to retry if the loading fails. But it searches for
>
> Seems to retry because we were supporting mixed revisions. Which we do
> not now.
The retry wasn't the problem, but hitting the same failed microcode over
and over is the problem. It is called out in the commit log.
As part of dropping mixed stepping, we can drop this retry.
Maybe the right way is to remember if the bsp failed, then there is no
point in trying to apply on the AP's.
reload_early_microcode->reload_ucode_intel()
->apply_microcode_intel()
we aren't checking if early load failed for bsp, we should save and
skip loading on all AP's.
>
> And if you say "seems" then this sounds like the problem hasn't been
> analyzed properly. If this can happen with the current code, then this
> needs to be fixed in stable. So, how do you trigger exactly?
>
> I'd like to reproduce it myself.
Certainly, take the fms+pf of the platform you are testing.
- Take a microcode file from the distribution for a different fms that didn't
belong to the one you are testing.
- You will have to fake the external header data and change it to the one
you want microcode match to work
- recompute all checksums and use that file instead of the original file.
I accidently ran into it since I had a copy of debug uCode that require
additional steps before loading.
I have a tool that I can change to give you some production microcode that
will fail in your platform. Just provide me with the fms+pf values, and I
an provide one for your test.
Let me know if you need one for testing.
>
> As to this patch: it should simply be removing the retrying instead of
> doing silly crap like
>
> bool retried = false;
>
> ...
>
> In light of how a lot has changed since last time, yes, please redo the
> patchset ontop of tip:x86/microcode, keeping in mind now that we don't
> support mixed revisions anymore.
>
> Just like dhansen said, you can split it in fixes and new features so
> that it is not too many patches at once - your call.
That makes sense, I'll send the bug fix patches separately.
Cheers,
Ashok
Powered by blists - more mailing lists