[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1da27840413348febf301ef39305de12@zhaoxin.com>
Date: Tue, 17 Sep 2019 06:54:05 +0000
From: Tony W Wang-oc <TonyWWang-oc@...oxin.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: "Borislav Petkov (bp@...en8.de)" <bp@...en8.de>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"yazen.ghannam@....com" <yazen.ghannam@....com>,
"vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
"qiuxu.zhuo@...el.com" <qiuxu.zhuo@...el.com>,
David Wang <DavidWang@...oxin.com>,
"Cooper Yan(BJ-RD)" <CooperYan@...oxin.com>,
"Qiyuan Wang(BJ-RD)" <QiyuanWang@...oxin.com>,
"Herry Yang(BJ-RD)" <HerryYang@...oxin.com>
Subject: Re: [PATCH v3 4/4] x86/mce: Add Zhaoxin LMCE support
On Mon, Sep 16, 2019, Luck, Tony wrote:
>On Mon, Sep 16, 2019 at 11:37:18AM +0000, Tony W Wang-oc wrote:
>> Zhaoxin newer CPUs support LMCE that compatible with Intel's
>> "Machine-Check Architecture", so add support for Zhaoxin LMCE
>> in mce/core.c.
>>
>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@...oxin.com>
>> ---
>> arch/x86/kernel/cpu/mce/core.c | 35
>+++++++++++++++++++++++++++++++++--
>> 1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index 65c5a1f..acdd76b 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -1132,6 +1132,27 @@ static bool __mc_check_crashing_cpu(int cpu)
>> u64 mcgstatus;
>>
>> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>> +
>> + if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
>> + if (mcgstatus & MCG_STATUS_LMCES)
>> + return false;
>> +
>> + if (!(mcgstatus & MCG_STATUS_LMCES)) {
>
>Don't really need this test ... you already did "return false" if
>the LMCES bit was set ... so this test is redundant (and you can avoid
>indenting the next dozen lines.
Got it, Thank you.
But have a question about below codes:
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return true;
}
These seems require all #MC exception errors set MCG_STATUS_RIPV = 1
in order to skip synchronize which "return true;" actually does for this.
As Intel SDM show, "Recoverable-not-continuable SRAR Type" errors may
set MCG_STATUS_RIPV = 0, PCC = 0. When these #MC errors broadcast
to offline CPU, may cause kernel panic with synchronize timeout (offline
CPU can't skip synchronize in this case).
Could "return true;" outside the if-case?
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
}
return true;
Sincerely
TonyWWang-oc
Powered by blists - more mailing lists