lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 16 Nov 2017 11:00:40 +0800
From:   Xie XiuQi <xiexiuqi@...wei.com>
To:     Borislav Petkov <bp@...en8.de>, "Luck, Tony" <tony.luck@...el.com>
CC:     "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        chen wei <chenwei68@...wei.com>
Subject: Re: [PATCH] x86/mce: add support SRAO reported via CMC check

Hi Borislav, Tony,

On 2017/11/15 18:33, Borislav Petkov wrote:
> On Wed, Nov 15, 2017 at 02:44:07AM +0000, Luck, Tony wrote:
>> This code is subtle :-(
> 
> I'm glad that we agree on this! :-)
> 
> Anyone wanting to rewrite it yet?
> 

In Intel SDM Volume 3B (253669-063US, July 2017), SRAO could be
reported either via MCE or CMC:

  In cases when SRAO is signaled via CMCI the error signature is
  indicated via UC=1, PCC=0, S=0.

  Type(*1)	UC	EN	PCC	S	AR	Signaling
  ---------------------------------------------------------------
  UC		1	1	1	x	x	MCE
  SRAR		1	1	0	1	1	MCE
  SRAO		1	x(*2)	0	x(*2)	0	MCE/CMC
  UCNA		1	x	0	0	0	CMC
  CE		0	x	x	x	x	CMC

  NOTES:
  1. SRAR, SRAO and UCNA errors are supported by the processor only
     when IA32_MCG_CAP[24] (MCG_SER_P) is set.
  2. EN=1, S=1 when signaled via MCE. EN=x, S=0 when signaled via CMC.

And there is a description in 15.6.2 UCR Error Reporting and Logging, for bit S:

  S (Signaling) flag, bit 56 - Indicates (when set) that a machine check
  exception was generated for the UCR error reported in this MC bank...
  When the S flag in the IA32_MCi_STATUS register is clear, this UCR error
  was not signaled via a machine check exception and instead was reported
  as a corrected machine check (CMC).

As the description in SDM, I think this flag could be used to determine whether
MCE or CMC was triggered. So we could merge this two case in one and just
remove the S=0 check for SRAO.

How about this patch?

>From a06b2a781a86e3b1fe241591b53f7a6d33d63331 Mon Sep 17 00:00:00 2001
From: Xie XiuQi <xiexiuqi@...wei.com>
Date: Tue, 14 Nov 2017 10:13:22 +0800
Subject: [PATCH] x86/mce: add support SRAO reported via CMC check

In Intel SDM Volume 3B (253669-063US, July 2017), SRAO could be
reported either via MCE or CMC:

  In cases when SRAO is signaled via CMCI the error signature is
  indicated via UC=1, PCC=0, S=0.

  Type(*1)	UC	EN	PCC	S	AR	Signaling
  ---------------------------------------------------------------
  UC		1	1	1	x	x	MCE
  SRAR		1	1	0	1	1	MCE
  SRAO		1	x(*2)	0	x(*2)	0	MCE/CMC
  UCNA		1	x	0	0	0	CMC
  CE		0	x	x	x	x	CMC

  NOTES:
  1. SRAR, SRAO and UCNA errors are supported by the processor only
     when IA32_MCG_CAP[24] (MCG_SER_P) is set.
  2. EN=1, S=1 when signaled via MCE. EN=x, S=0 when signaled via CMC.

And there is a description in 15.6.2 UCR Error Reporting and Logging, for bit S:

  S (Signaling) flag, bit 56 - Indicates (when set) that a machine check
  exception was generated for the UCR error reported in this MC bank...
  When the S flag in the IA32_MCi_STATUS register is clear, this UCR error
  was not signaled via a machine check exception and instead was reported
  as a corrected machine check (CMC).

So we could merge this two case, and just remove the S=0 check for SRAO
in mce_severity().

Signed-off-by: Xie XiuQi <xiexiuqi@...wei.com>
Tested-by: Chen Wei <chenwei68@...wei.com>
---
 arch/x86/kernel/cpu/mcheck/mce-severity.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 4ca632a..5bbd06f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -59,6 +59,7 @@
 #define  MCGMASK(x, y)	.mcgmask = x, .mcgres = y
 #define  MASK(x, y)	.mask = x, .result = y
 #define MCI_UC_S (MCI_STATUS_UC|MCI_STATUS_S)
+#define MCI_UC_AR (MCI_STATUS_UC|MCI_STATUS_AR)
 #define MCI_UC_SAR (MCI_STATUS_UC|MCI_STATUS_S|MCI_STATUS_AR)
 #define	MCI_ADDR (MCI_STATUS_ADDRV|MCI_STATUS_MISCV)

@@ -101,6 +102,22 @@
 		NOSER, BITCLR(MCI_STATUS_UC)
 		),

+	/*
+	 * known AO MCACODs reported via MCE or CMC:
+	 *
+	 * SRAO could be signaled either via a machine check exception or
+	 * CMCI with the corresponding bit S 1 or 0. So we don't need to
+	 * check bit S for SRAO.
+	 */
+	MCESEV(
+		AO, "Action optional: memory scrubbing error",
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_AR|MCACOD_SCRUBMSK, MCI_STATUS_UC|MCACOD_SCRUB)
+		),
+	MCESEV(
+		AO, "Action optional: last level cache writeback error",
+		SER, MASK(MCI_STATUS_OVER|MCI_UC_AR|MCACOD, MCI_STATUS_UC|MCACOD_L3WB)
+		),
+
 	/* ignore OVER for UCNA */
 	MCESEV(
 		UCNA, "Uncorrected no action required",
@@ -149,15 +166,6 @@
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR, MCI_UC_SAR)
 		),

-	/* known AO MCACODs: */
-	MCESEV(
-		AO, "Action optional: memory scrubbing error",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD_SCRUBMSK, MCI_UC_S|MCACOD_SCRUB)
-		),
-	MCESEV(
-		AO, "Action optional: last level cache writeback error",
-		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCACOD, MCI_UC_S|MCACOD_L3WB)
-		),
 	MCESEV(
 		SOME, "Action optional: unknown MCACOD",
 		SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR, MCI_UC_S)
-- 
1.8.3.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ