[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1432916882-1304-4-git-send-email-ashok.raj@intel.com>
Date: Fri, 29 May 2015 09:28:02 -0700
From: Ashok Raj <ashok.raj@...el.com>
To: linux-kernel@...r.kernel.org
Cc: Ashok Raj <ashok.raj@...el.com>, linux-edac@...r.kernel.org,
Borislav Petkov <bp@...e.de>, Tony Luck <tony.luck@...el.com>
Subject: [Patch V1 3/3] x86, mce: Handling LMCE events
This patch has handling changes to do_machine_check() to process MCE
signaled as local MCE. Typically only recoverable errors (SRAR) type
error will be Signaled as LMCE. But architecture does not restrict to
only those errors.
When errors are signaled as LMCE, there is no need for the MCE handler to
perform rendezvous with other logical processors unlike earlier processors
that would broadcast machine check errors.
See http://www.intel.com/sdm Volume 3, Chapter 15 for more information
on MSR's and documentation on Local MCE.
Signed-off-by: Ashok Raj <ashok.raj@...el.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 25 ++++++++++++++++++++++---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 1 +
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d10aada..c130391 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1047,6 +1047,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
char *msg = "Unknown";
u64 recover_paddr = ~0ull;
int flags = MF_ACTION_REQUIRED;
+ int lmce = 0;
prev_state = ist_enter(regs);
@@ -1074,11 +1075,19 @@ void do_machine_check(struct pt_regs *regs, long error_code)
kill_it = 1;
/*
+ * Check if this MCE is signaled to only this logical processor
+ */
+ if (m.mcgstatus & MCG_STATUS_LMCES)
+ lmce = 1;
+ /*
* Go through all the banks in exclusion of the other CPUs.
* This way we don't report duplicated events on shared banks
* because the first one to see it will clear it.
+ * If this is a Local MCE, then no need to perform rendezvous.
*/
- order = mce_start(&no_way_out);
+ if (!lmce)
+ order = mce_start(&no_way_out);
+
for (i = 0; i < cfg->banks; i++) {
__clear_bit(i, toclear);
if (!test_bit(i, valid_banks))
@@ -1155,8 +1164,18 @@ void do_machine_check(struct pt_regs *regs, long error_code)
* Do most of the synchronization with other CPUs.
* When there's any problem use only local no_way_out state.
*/
- if (mce_end(order) < 0)
- no_way_out = worst >= MCE_PANIC_SEVERITY;
+ if (!lmce) {
+ if (mce_end(order) < 0)
+ no_way_out = worst >= MCE_PANIC_SEVERITY;
+ } else {
+ /*
+ * Local MCE skipped calling mce_reign()
+ * If we found a fatal error, we need to panic here.
+ */
+ if (worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
+ mce_panic("Machine check from unknown source",
+ NULL, NULL);
+ }
/*
* At insane "tolerant" levels we take no action. Otherwise
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index be3a5c6..73a2844 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -484,4 +484,5 @@ void mce_intel_feature_init(struct cpuinfo_x86 *c)
{
intel_init_thermal(c);
intel_init_cmci();
+ intel_init_lmce();
}
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists