[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tip-6bda529ec42e1cd4dde1c3d0a1a18000ffd3d419@git.kernel.org>
Date: Tue, 3 May 2016 00:47:17 -0700
From: tip-bot for Aravind Gopalakrishnan <tipbot@...or.com>
To: linux-tip-commits@...r.kernel.org
Cc: luto@...capital.net, dvlasenk@...hat.com, Yazen.Ghannam@....com,
aravindksg.lkml@...il.com, bp@...e.de, tglx@...utronix.de,
Aravind.Gopalakrishnan@....com, mingo@...nel.org,
peterz@...radead.org, brgerst@...il.com, bp@...en8.de,
torvalds@...ux-foundation.org, linux-edac@...r.kernel.org,
linux-kernel@...r.kernel.org, tony.luck@...el.com, hpa@...or.com
Subject: [tip:ras/core] x86/mce: Grade uncorrected errors for SMCA-enabled
systems
Commit-ID: 6bda529ec42e1cd4dde1c3d0a1a18000ffd3d419
Gitweb: http://git.kernel.org/tip/6bda529ec42e1cd4dde1c3d0a1a18000ffd3d419
Author: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@....com>
AuthorDate: Sat, 30 Apr 2016 14:33:52 +0200
Committer: Ingo Molnar <mingo@...nel.org>
CommitDate: Tue, 3 May 2016 08:24:15 +0200
x86/mce: Grade uncorrected errors for SMCA-enabled systems
For upcoming processors with Scalable MCA feature, we need to check the
"succor" CPUID bit and the TCC bit in the MCx_STATUS register in order
to grade an MCE's severity.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@....com>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@....com>
[ Simplified code flow, shortened comments. ]
Signed-off-by: Borislav Petkov <bp@...e.de>
Cc: Andy Lutomirski <luto@...capital.net>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@...il.com>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Brian Gerst <brgerst@...il.com>
Cc: Denys Vlasenko <dvlasenk@...hat.com>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Tony Luck <tony.luck@...el.com>
Cc: linux-edac <linux-edac@...r.kernel.org>
Link: http://lkml.kernel.org/r/1459886686-13977-3-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/1462019637-16474-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@...nel.org>
---
arch/x86/kernel/cpu/mcheck/mce-severity.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 5119766..631356c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -204,6 +204,33 @@ static int error_context(struct mce *m)
return IN_KERNEL;
}
+static int mce_severity_amd_smca(struct mce *m, int err_ctx)
+{
+ u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
+ u32 low, high;
+
+ /*
+ * We need to look at the following bits:
+ * - "succor" bit (data poisoning support), and
+ * - TCC bit (Task Context Corrupt)
+ * in MCi_STATUS to determine error severity.
+ */
+ if (!mce_flags.succor)
+ return MCE_PANIC_SEVERITY;
+
+ if (rdmsr_safe(addr, &low, &high))
+ return MCE_PANIC_SEVERITY;
+
+ /* TCC (Task context corrupt). If set and if IN_KERNEL, panic. */
+ if ((low & MCI_CONFIG_MCAX) &&
+ (m->status & MCI_STATUS_TCC) &&
+ (err_ctx == IN_KERNEL))
+ return MCE_PANIC_SEVERITY;
+
+ /* ...otherwise invoke hwpoison handler. */
+ return MCE_AR_SEVERITY;
+}
+
/*
* See AMD Error Scope Hierarchy table in a newer BKDG. For example
* 49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features"
@@ -225,6 +252,9 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc
* to at least kill process to prolong system operation.
*/
if (mce_flags.overflow_recov) {
+ if (mce_flags.smca)
+ return mce_severity_amd_smca(m, ctx);
+
/* software can try to contain */
if (!(m->mcgstatus & MCG_STATUS_RIPV) && (ctx == IN_KERNEL))
return MCE_PANIC_SEVERITY;
Powered by blists - more mailing lists