[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ1PR11MB60830F29ECFA61275F56CA4FFC4F2@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date: Fri, 25 Oct 2024 23:57:59 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>, "x86@...nel.org" <x86@...nel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: Borislav Petkov <bp@...en8.de>, Thomas Gleixner <tglx@...utronix.de>, Ingo
Molnar <mingo@...hat.com>, Dave Hansen <dave.hansen@...ux.intel.com>, "H.
Peter Anvin" <hpa@...or.com>, Benjamin Herrenschmidt <benh@...zon.com>
Subject: RE: WARNING in lmce_supported() during reboot.
> and the triggered WARN_ON_ONCE() in lmce_supported() is here.
> https://github.com/amazonlinux/linux/blob/kernel-6.1.61-85.141.amzn2023/arch/x86/kernel/cpu/mce/intel.c#L124
So the warning is this one:
if (WARN_ON_ONCE(!(tmp & FEAT_CTL_LOCKED)))
It is checking that the MSR_IA32_FEAT_CTL (MSR 0x3a) has been correctly
set and locked by BIOS. I.e. that LMCE mode can't be snatched away by
someone rewriting this MSR.
That said, you ought to either hit it all the time, or never. So this "sometimes"
state is weird.
Which CPU model do you see this on?
Can you please try using the rdmsr/wrmsr commands from msr-tools to:
a) read this MSR on all CPUs to check it is set to the same value and that
bit 0 is set to 1.
b) try writing to this MSR (maybe try clearing the lock bit (bit 0) or the LMCE bit (bit 20)
and see if that succeeds.
-Tony
Powered by blists - more mailing lists