lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1da27840413348febf301ef39305de12@zhaoxin.com>
Date:   Tue, 17 Sep 2019 06:54:05 +0000
From:   Tony W Wang-oc <TonyWWang-oc@...oxin.com>
To:     "Luck, Tony" <tony.luck@...el.com>
CC:     "Borislav Petkov (bp@...en8.de)" <bp@...en8.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "yazen.ghannam@....com" <yazen.ghannam@....com>,
        "vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
        "qiuxu.zhuo@...el.com" <qiuxu.zhuo@...el.com>,
        David Wang <DavidWang@...oxin.com>,
        "Cooper Yan(BJ-RD)" <CooperYan@...oxin.com>,
        "Qiyuan Wang(BJ-RD)" <QiyuanWang@...oxin.com>,
        "Herry Yang(BJ-RD)" <HerryYang@...oxin.com>
Subject: Re: [PATCH v3 4/4] x86/mce: Add Zhaoxin LMCE support

On Mon, Sep 16, 2019, Luck, Tony wrote:
>On Mon, Sep 16, 2019 at 11:37:18AM +0000, Tony W Wang-oc wrote:
>> Zhaoxin newer CPUs support LMCE that compatible with Intel's
>> "Machine-Check Architecture", so add support for Zhaoxin LMCE
>> in mce/core.c.
>>
>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@...oxin.com>
>> ---
>>  arch/x86/kernel/cpu/mce/core.c | 35
>+++++++++++++++++++++++++++++++++--
>>  1 file changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index 65c5a1f..acdd76b 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -1132,6 +1132,27 @@ static bool __mc_check_crashing_cpu(int cpu)
>>  		u64 mcgstatus;
>>
>>  		mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
>> +
>> +		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
>> +			if (mcgstatus & MCG_STATUS_LMCES)
>> +				return false;
>> +
>> +			if (!(mcgstatus & MCG_STATUS_LMCES)) {
>
>Don't really need this test ... you already did "return false" if
>the LMCES bit was set ... so this test is redundant (and you can avoid
>indenting the next dozen lines.

Got it, Thank you.

But have a question about below codes:
	if (mcgstatus & MCG_STATUS_RIPV) {
		mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
		return true;
	}
These seems require all #MC exception errors set MCG_STATUS_RIPV = 1
in order to skip synchronize which "return true;" actually does for this.

As Intel SDM show, "Recoverable-not-continuable SRAR Type" errors may
set MCG_STATUS_RIPV = 0, PCC = 0. When these #MC errors broadcast
to offline CPU, may cause kernel panic with synchronize timeout (offline
CPU can't skip synchronize in this case).

Could "return true;" outside the if-case?
	if (mcgstatus & MCG_STATUS_RIPV) {
		mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
	} 
	return true; 

Sincerely
TonyWWang-oc

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ