[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <985acf114ab245fbab52caabf03bd280@zhaoxin.com>
Date: Thu, 30 May 2019 09:13:39 +0000
From: Tony W Wang-oc <TonyWWang-oc@...oxin.com>
To: "tipbot@...or.com" <tipbot@...or.com>,
"ashok.raj@...el.com" <ashok.raj@...el.com>
CC: "bp@...e.de" <bp@...e.de>, "hpa@...or.com" <hpa@...or.com>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-tip-commits@...r.kernel.org"
<linux-tip-commits@...r.kernel.org>,
"mingo@...nel.org" <mingo@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"tony.luck@...el.com" <tony.luck@...el.com>,
"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
David Wang <DavidWang@...oxin.com>
Subject: 答复: Re: [tip:x86/urgent] x86/mce: Ensure offline CPUs don' t participate in rendezvous process
On Thu, May 30, 2019, Tony W Wang-oc wrote:
> Hi Ashok,
> I have two questions about this patch, could you help to check:
>
> 1, for broadcast #MC exceptions, this patch seems require #MC exception
> errors
> set MCG_STATUS_RIPV = 1.
> But for Intel CPU, some #MC exception errors set MCG_STATUS_RIPV = 0
> (like "Recoverable-not-continuable SRAR Type" Errors), for these errors
> the patch doesn't seem to work, is that okay?
>
> 2, for LMCE exceptions, this patch seems require #MC exception errors
> set MCG_STATUS_RIPV = 0 to make sure LMCE be handled normally even
> on offline CPU.
> For LMCE errors set MCG_STAUS_RIPV = 1, the patch prevents offline CPU
> handle these LMCE errors, is that okay?
>
More specifically, this patch seems require #MC exceptions meet the condition
"MCG_STATUS_RIPV ^ MCG_STATUS_LMCES == 1"; But on a Xeon X5650 machine (SMP),
"Data CACHE Level-2 Generic Error" does not meet this condition.
I got below message from: https://www.centos.org/forums/viewtopic.php?p=292742
Hardware event. This is not a software error.
MCE 0
CPU 4 BANK 6 TSC b7065eeaa18b0
TIME 1545643603 Mon Dec 24 10:26:43 2018
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-2 Generic Error
STATUS b200000080000106 MCGSTATUS 4
MCGCAP 1c09 APICID 4 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44
> Thanks
> Tony W Wang-oc
Powered by blists - more mailing lists