linux-kernel - [PATCH v4] x86: mce: kexec: switch MCE handler for kexec/kdump

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150304074117.GA30501@hori1.linux.bs1.fc.nec.co.jp>
Date:	Wed, 4 Mar 2015 07:41:18 +0000
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	"Luck, Tony" <tony.luck@...el.com>
CC:	Borislav Petkov <bp@...en8.de>,
	Prarit Bhargava <prarit@...hat.com>,
	"Vivek Goyal" <vgoyal@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Junichi Nomura <j-nomura@...jp.nec.com>,
	Kiyoshi Ueda <k-ueda@...jp.nec.com>
Subject: [PATCH v4] x86: mce: kexec: switch MCE handler for kexec/kdump

On Tue, Mar 03, 2015 at 06:09:27PM +0000, Luck, Tony wrote:
> +static void machine_check_under_kdump(struct pt_regs *regs, long error_code)
> +{
> +	if (mca_cfg.kdump_cpu == smp_processor_id())
> +		pr_emerg("MCE triggered when kdumping. If you are lucky enough, you will have a kdump. Otherwise, this is a dying message.\n");
> 
> I'm worried about the SRAR case here.  Your code just returns, which will trigger the same machine check again. The system will spin forever printing this message.

You're right.

> I think you have to look at MCG_STATUS and scan the machine check banks to make a choice.  There are some simple cases:
> 
>   MCG_STATUS.RIPV=0 -> cannot return (where will the cpu go - you have no idea!)
>   SRAO -> safe to just return
>   SRAR -> should not return
> 
> But the rest may require some thought.  If there is a PCC=1 error, then you may end up with a corrupt dump. Perhaps this case will already be covered by RPIV==0?

A PCC=1 error is defined as UC error in SDM, and our severity assessor returns
MCE_PANIC_SEVERITY for it, so kdump should abort in such case.

So I added severity checking code in the new MCE handler in the following patch,
which borrowed some more code from do_machine_check(). Please see also the
changelog in the patch description.

Thanks,
Naoya Horiguchi
---
>From 6c0d53b80d6a45568c250d7b0d227f3399a2f2f2 Mon Sep 17 00:00:00 2001
From: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Date: Wed, 4 Mar 2015 16:13:30 +0900
Subject: [PATCH] x86: mce: kexec: switch MCE handler for kexec/kdump

kexec disables (or "shoots down") all CPUs other than a crashing CPU before
entering the 2nd kernel. But the MCE handler is still enabled after that,
so if MCE happens and broadcasts over the CPUs after the main thread starts
the 2nd kernel (which might not initialize MCE device yet, or might decide
not to enable it,) MCE handler runs only on the other CPUs (not on the main
thread,) leading to kernel panic with MCE synchronization. The user-visible
effect of this bug is kdump failure.

Our standard MCE handler do_machine_check() assumes some about system's
status and it's hard to alter it to cover kexec/kdump context, so let's add
another kdump-specific one and switch to it.

Note that this problem exists since current MCE handler was implemented in
2.6.32, and recently commit 716079f66eac ("mce: Panic when a core has reached
a timeout") made it more visible by changing the default behavior of the
synchronization timeout from "ignore" to "panic".

Signed-off-by: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Cc: <stable@...r.kernel.org>        [2.6.32+]
---
ChangeLog v3 -> v4:
- fixed AR and UC order in enum severity_level because UC is severer than AR
  by definition. Current code is not affected by this wrong order by chance.
- check severity in machine_check_under_kdump(), and call mce_panic() if the
  resultant severity is as bad as or worse than MCE_AR_SEVERITY.
- use static global variable kdump_cpu instead of mca_cfg->kdump_cpu
- reduce "#ifdef CONFIG_KEXEC"
- add "#ifdef CONFIG_X86_MCE" for declaration of machine_check_under_kdump()
  in mce.h
- update comment on switch_mce_handler_for_kdump()

ChangeLog v2 -> v3
- go to "switch MCE handler" approach

ChangeLog v1 -> v2
- clear MSR_IA32_MCG_CTL, MSR_IA32_MCx_CTL, and CR4.MCE instead of using
  global flag to ignore MCE events.
- fixed the description of the problem
---
 arch/x86/include/asm/mce.h                |  7 +++
 arch/x86/kernel/cpu/mcheck/mce-internal.h |  2 +-
 arch/x86/kernel/cpu/mcheck/mce.c          | 82 +++++++++++++++++++++++++++++++
 arch/x86/kernel/crash.c                   |  3 ++
 4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 51b26e895933..223c4897a637 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -175,6 +175,13 @@ static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
 #endif
 
 int mce_available(struct cpuinfo_x86 *c);
+#ifdef CONFIG_KEXEC
+#ifdef CONFIG_X86_MCE
+void switch_mce_handler_for_kdump(void);
+#else
+static inline void switch_mce_handler_for_kdump(void) {}
+#endif
+#endif
 
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 10b46906767f..62e41dee907f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -8,8 +8,8 @@ enum severity_level {
 	MCE_KEEP_SEVERITY,
 	MCE_SOME_SEVERITY,
 	MCE_AO_SEVERITY,
-	MCE_UC_SEVERITY,
 	MCE_AR_SEVERITY,
+	MCE_UC_SEVERITY,
 	MCE_PANIC_SEVERITY,
 };
 
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 3112b79ace8e..a9b23ea86e56 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1219,6 +1219,88 @@ void do_machine_check(struct pt_regs *regs, long error_code)
 }
 EXPORT_SYMBOL_GPL(do_machine_check);
 
+#ifdef CONFIG_KEXEC
+static int kdump_cpu;
+
+/*
+ * kdump-specific machine check handler
+ *
+ * When kexec/kdump is running, what the MCE handler is expected to do
+ * changes depending on whether the CPU is running the main thread or not.
+ *
+ * The crashing CPU, controlling the whole system exclusively, should try to
+ * get kdump as hard as possible even if an MCE happens concurrently, because
+ * some types of MCEs (for example, uncorrected errors like SRAO,)
+ * are not fatal or don't ruin reliablility of the kdump (consider that an
+ * MCE can hit the other CPU, in which case corrupted data is never consumed.)
+ * If an MCE critically breaks the kdump operation, we are unlucky so let's
+ * accept the fate of whatever HW causes, hoping a dying message reaches admins.
+ *
+ * The other CPUs are supposed to be quiet during kexec/kdump, so after the
+ * crashing CPU shot them down, they should not do anything except clearing
+ * MCG_STATUS (without this the system is reset, which is undesirable.)
+ * Note that this is also true after the crashing CPU enter the 2nd kernel.
+ */
+static void machine_check_under_kdump(struct pt_regs *regs, long error_code)
+{
+	struct mce m, final;
+	char *msg = NULL;
+	char *nmsg = NULL;
+	int i;
+	int worst = 0;
+	int severity;
+
+	if (!mca_cfg.banks)
+		goto out;
+	if (kdump_cpu != smp_processor_id())
+		goto clear_mcg_status;
+
+	mce_gather_info(&m, regs);
+	for (i = 0; i < mca_cfg.banks; i++) {
+		m.status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
+		if (m.status & MCI_STATUS_VAL)
+			if (quirk_no_way_out)
+				quirk_no_way_out(i, &m, regs);
+		severity = mce_severity(&m, mca_cfg.tolerant, &nmsg, true);
+		if (severity > worst) {
+			final = m;
+			worst = severity;
+			msg = nmsg;
+		}
+	}
+
+	if (worst >= MCE_AR_SEVERITY)
+		mce_panic("Unignorable machine check under kdumping", &final, msg);
+
+	pr_emerg("MCE triggered under kdumping. If you are lucky enough, you will have a kdump. Otherwise, this is a dying message.\n");
+
+clear_mcg_status:
+	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
+out:
+	sync_core();
+}
+
+/*
+ * Switch the MCE handler to kdump-specific one (called in kdump entering code)
+ *
+ * Standard MCE handler do_machine_check() is not designed for kexec/kdump
+ * context, where we can't expect MCE recovering and logging to work fine
+ * because the kernel might be unstable (all CPUs except one must be quiet
+ * as possible.)
+ *
+ * Moreover, in such situation getting kdump is prior to doing something for
+ * MCEs because what the users are really interested in is to find what caused
+ * the crashing, not what caused the crashing to fail.
+ *
+ * So let's switch MCE handler to the one suitable for kexec/kdump situation.
+ */
+void switch_mce_handler_for_kdump(void)
+{
+	kdump_cpu = smp_processor_id();
+	machine_check_vector = machine_check_under_kdump;
+}
+#endif
+
 #ifndef CONFIG_MEMORY_FAILURE
 int memory_failure(unsigned long pfn, int vector, int flags)
 {
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 6f3baedcb6f6..273805e772f6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -34,6 +34,7 @@
 #include <asm/cpu.h>
 #include <asm/reboot.h>
 #include <asm/virtext.h>
+#include <asm/mce.h>
 
 /* Alignment required for elf header segment */
 #define ELF_CORE_HEADER_ALIGN   4096
@@ -166,6 +167,8 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	/* The kernel is broken so disable interrupts */
 	local_irq_disable();
 
+	switch_mce_handler_for_kdump();
+
 	kdump_nmi_shootdown_cpus();
 
 	/*
-- 
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/