lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 30 Sep 2009 23:48:59 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Borislav Petkov <petkovbb@...glemail.com>,
	Andi Kleen <andi@...stfloor.org>, x86@...nel.org,
	linux-kernel@...r.kernel.org,
	Borislav Petkov <borislav.petkov@....com>, torvalds@...l.org
Subject: Re: x86: mce: Please revert
	22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9


* Ingo Molnar <mingo@...e.hu> wrote:

> 
> * Borislav Petkov <petkovbb@...glemail.com> wrote:
> 
> > On Wed, Sep 30, 2009 at 04:09:04PM +0200, Andi Kleen wrote:
> > > 
> > > Can someone please revert this incorrect commit that's in mainline
> > > now? 
> > > 
> > > Obviously kernels compiled with AMD support can still run on non
> > > AMD systems, so messages like this can never be removed at compile time.
> > > 
> > > -andi
> > > 
> > > Commit 22223c9b417be5fd0ab2cf9ad17eb7bd1e19f7b9
> > > Author: Borislav Petkov <borislav.petkov@....com>
> > > Date:   Tue Jul 28 14:47:10 2009 +0200
> > > 
> > >     x86, mce: do not compile mcelog message on AMD
> > >     
> > >     Now that decoding is done in-kernel, suppress mcelog message part.
> > >     
> > >     CC: Andi Kleen <andi@...stfloor.org>
> > >     Signed-off-by: Borislav Petkov <borislav.petkov@....com>
> > > 
> > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> > > index b82866f..9bfe9d2 100644
> > > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > > @@ -222,7 +222,10 @@ static void print_mce_head(void)
> > >  static void print_mce_tail(void)
> > >  {
> > >         printk(KERN_EMERG "This is not a software problem!\n"
> > > -              "Run through mcelog --ascii to decode and contact your hardware vendor\n");
> > > +#if (!defined(CONFIG_EDAC) || !defined(CONFIG_CPU_SUP_AMD))
> > 
> > how about
> > 
> > 	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> > 		pr_emerg("Run through mcelog --ascii to decode and contact your	hardware vendor\n");
> > 
> > instead?
> 
> Yeah, a runtime check like that would be fine - but i'd suggest 
> something more clearly and more specifically connected to in-kernel 
> decoding: please define a new x86_mce_can_decode_errors capability 
> flag or so.
> 
> Obviously the Intel CPU side should be fixed and improved to decode 
> MCE errors in the kernel too.
> 
> Please also fix that printk to say something like:
> 
>   "MCE error decoding not supported on this CPU: run through mcelog --ascii to decode\n"
> 
> Thanks,

I.e. something like the patch below. Completely untested.

Note, while looking at the interaction of decode_mce() with the other 
MCE code i also noticed a few other things and made the following 
cleanups/fixes:

 - Fixed the mce_decode() weak alias - a weak alias is really not good
   here, it should be a proper callback. A weak alias will be overriden
   if a piece of code is built into the kernel - not good, obviously.

 - The patch initializes the callback on AMD family 10h and 11h - a
   quick glance suggests that decoding of earlier models isnt supported?

 - Added the more correct fallback printk of:

	No support for human readable MCE decoding on this CPU type.
	Transcribe the message and run it through 'mcelog --ascii' to decode.

   On CPUs that dont have a decoder.

 - Made the surrounding code more readable.

Note that the callback allows us to have a default fallback - without 
having to check the CPU versions during the printout itself. When an 
EDAC module registers itself, it can install the decode-print function.

(there's no unregister needed as this is core code.)

	Ingo

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index b608a64..f52d219 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -133,6 +133,8 @@ static inline void winchip_mcheck_init(struct cpuinfo_x86 *c) {}
 static inline void enable_p5_mce(void) {}
 #endif
 
+extern void (*x86_decode_mce_callback)(struct mce *m);
+
 void mce_setup(struct mce *m);
 void mce_log(struct mce *m);
 DECLARE_PER_CPU(struct sys_device, mce_dev);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 183c345..adc8e2a 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -85,6 +85,18 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_wait);
 static DEFINE_PER_CPU(struct mce, mces_seen);
 static int			cpu_missing;
 
+static void default_decode_mce(struct mce *m)
+{
+	pr_emerg("No support for human readable MCE decoding on this CPU type.\n");
+	pr_emerg("Transcribe the message and run it through 'mcelog --ascii' to decode.\n");
+}
+
+/*
+ * CPU/chipset specific EDAC code can register a callback here to print
+ * MCE errors in a human-readable form:
+ */
+void (*x86_decode_mce_callback)(struct mce *m) = default_decode_mce;
+EXPORT_SYMBOL(x86_decode_mce_callback);
 
 /* MCA banks polled by the period polling timer for corrected events */
 DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
@@ -165,46 +177,47 @@ void mce_log(struct mce *mce)
 	set_bit(0, &mce_need_notify);
 }
 
-void __weak decode_mce(struct mce *m)
-{
-	return;
-}
-
 static void print_mce(struct mce *m)
 {
-	printk(KERN_EMERG
-	       "CPU %d: Machine Check Exception: %16Lx Bank %d: %016Lx\n",
+	pr_emerg("CPU %d: Machine Check Exception: %16Lx Bank %d: %016Lx\n",
 	       m->extcpu, m->mcgstatus, m->bank, m->status);
+
 	if (m->ip) {
-		printk(KERN_EMERG "RIP%s %02x:<%016Lx> ",
+		pr_emerg("RIP%s %02x:<%016Lx> ",
 		       !(m->mcgstatus & MCG_STATUS_EIPV) ? " !INEXACT!" : "",
 		       m->cs, m->ip);
+
 		if (m->cs == __KERNEL_CS)
 			print_symbol("{%s}", m->ip);
-		printk(KERN_CONT "\n");
+		pr_cont("\n");
 	}
-	printk(KERN_EMERG "TSC %llx ", m->tsc);
+
+	pr_emerg("TSC %llx ", m->tsc);
 	if (m->addr)
-		printk(KERN_CONT "ADDR %llx ", m->addr);
+		pr_cont("ADDR %llx ", m->addr);
 	if (m->misc)
-		printk(KERN_CONT "MISC %llx ", m->misc);
-	printk(KERN_CONT "\n");
-	printk(KERN_EMERG "PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x\n",
+		pr_cont("MISC %llx ", m->misc);
+
+	pr_cont("\n");
+	pr_emerg("PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x\n",
 			m->cpuvendor, m->cpuid, m->time, m->socketid,
 			m->apicid);
 
-	decode_mce(m);
+	/*
+	 * Print out human-readable details about the MCE error,
+	 * (if the CPU has an implementation for that):
+	 */
+	x86_decode_mce_callback(m);
 }
 
 static void print_mce_head(void)
 {
-	printk(KERN_EMERG "\nHARDWARE ERROR\n");
+	pr_emerg("\nHARDWARE ERROR\n");
 }
 
 static void print_mce_tail(void)
 {
-	printk(KERN_EMERG "This is not a software problem!\n"
-	       "Run through mcelog --ascii to decode and contact your hardware vendor\n");
+	pr_emerg("This is not a software problem!\n");
 }
 
 #define PANIC_TIMEOUT 5 /* 5 seconds */
diff --git a/drivers/edac/edac_mce_amd.c b/drivers/edac/edac_mce_amd.c
index 0c21c37..4fee380 100644
--- a/drivers/edac/edac_mce_amd.c
+++ b/drivers/edac/edac_mce_amd.c
@@ -362,7 +362,7 @@ static inline void amd_decode_err_code(unsigned int ec)
 		pr_warning("Huh? Unknown MCE error 0x%x\n", ec);
 }
 
-void decode_mce(struct mce *m)
+static void amd_decode_mce(struct mce *m)
 {
 	struct err_regs regs;
 	int node, ecc;
@@ -420,3 +420,15 @@ void decode_mce(struct mce *m)
 
 	amd_decode_err_code(m->status & 0xffff);
 }
+
+static __init int mce_amd_init(void)
+{
+	/*
+	 * We can decode MCEs for Opteron and later CPUs:
+	 */
+	if ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && (boot_cpu_data.x86 >= 0x10))
+		x86_mce_decode_callback = amd_decode;
+
+	return 0;
+}
+early_initcall(mce_amd_init);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ