[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090817112938.GA22794@elte.hu>
Date: Mon, 17 Aug 2009 13:29:38 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com, hpa@...or.com,
ak@...ux.intel.com, tglx@...utronix.de,
Yinghai Lu <yinghai@...nel.org>,
Huang Ying <ying.huang@...el.com>,
"Rafael J. Wysocki" <rjw@...k.pl>,
linux-tip-commits@...r.kernel.org
Subject: [PATCH] x86, mce: Don't initialize MCEs on unknown CPUs
* Hidetoshi Seto <seto.hidetoshi@...fujitsu.com> wrote:
> Old mce codes doesn't take bootlog.
>
> One possibility is: if the BIOS doesn't clear status in banks, new
> mce codes will try to log such junks. If the junk is totally junk
> but can be decoded as a valid log with MISCV or ADDRV bit, and if
> the cpu try to access register which is not implemented (e.g.
> IA32_MCi_MISC/ADDR), then such access might cause a general
> protection exception. (ref. ASDM 3A 15.3.2.3)
>
> I'm just guessing...
btw., i found the bug - it's due to:
# CONFIG_CPU_SUP_INTEL is not set
which is in essence disabling the MCE quirks in mce_cpu_quirks().
Quirk handlers like this, if they see an 'unknown' CPU should assume
the worst and go for the maximum amount of quirks - or disable MCE.
I went for the second option as it's the safer one - see the fix
below.
Ingo
---------------------->
>From e412cd257e0d51e0ecbb89f50953835b5a0681b2 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@...e.hu>
Date: Mon, 17 Aug 2009 10:19:00 +0200
Subject: [PATCH] x86, mce: Don't initialize MCEs on unknown CPUs
An older test-box started hanging at the following point during
bootup:
[ 0.022996] Mount-cache hash table entries: 512
[ 0.024996] Initializing cgroup subsys debug
[ 0.025996] Initializing cgroup subsys cpuacct
[ 0.026995] Initializing cgroup subsys devices
[ 0.027995] Initializing cgroup subsys freezer
[ 0.028995] mce: CPU supports 5 MCE banks
I've bisected it down to commit 4efc0670 ("x86, mce: use 64bit
machine check code on 32bit"), which utilizes the MCE code on
32-bit systems too.
The problem is caused by this detail in my config:
# CONFIG_CPU_SUP_INTEL is not set
This disables the quirks in mce_cpu_quirks() but still enables
MCE support - which then hangs due to the missing quirk
workaround needed on this CPU:
if (c->x86 == 6 && c->x86_model < 0x1A && banks > 0)
mce_banks[0].init = 0;
The safe solution is to not initialize MCEs if we dont know on
what CPU we are running (or if that CPU's support code got
disabled in the config).
Also be a bit more defensive on 32-bit systems: dont do a
boot-time dump of pending MCEs not just on the specific system
that we found a problem with (Pentium-M), but earlier ones as
well.
Now this problem is probably not common and disabling CPU
support is rare - but still being more defensive in something
we turned on for a wide range of CPUs is prudent.
Cc: Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
LKML-Reference: Message-ID: <4A88E3E4.40506@...fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
arch/x86/kernel/cpu/mcheck/mce.c | 19 ++++++++++++++-----
1 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a0c2910..0121304 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1226,8 +1226,13 @@ static void mce_init(void)
}
/* Add per CPU specific workarounds here */
-static void mce_cpu_quirks(struct cpuinfo_x86 *c)
+static int mce_cpu_quirks(struct cpuinfo_x86 *c)
{
+ if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
+ pr_info("MCE: unknown CPU type - not enabling MCE support.\n");
+ return -EOPNOTSUPP;
+ }
+
/* This should be disabled by the BIOS, but isn't always */
if (c->x86_vendor == X86_VENDOR_AMD) {
if (c->x86 == 15 && banks > 4) {
@@ -1274,14 +1279,19 @@ static void mce_cpu_quirks(struct cpuinfo_x86 *c)
monarch_timeout < 0)
monarch_timeout = USEC_PER_SEC;
- /* There are also broken BIOSes on some Pentium M systems. */
- if (c->x86 == 6 && c->x86_model == 13 && mce_bootlog < 0)
+ /*
+ * There are also broken BIOSes on some Pentium M and
+ * earlier systems:
+ */
+ if (c->x86 == 6 && c->x86_model <= 13 && mce_bootlog < 0)
mce_bootlog = 0;
}
if (monarch_timeout < 0)
monarch_timeout = 0;
if (mce_bootlog != 0)
mce_panic_timeout = 30;
+
+ return 0;
}
static void __cpuinit mce_ancient_init(struct cpuinfo_x86 *c)
@@ -1342,11 +1352,10 @@ void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
if (!mce_available(c))
return;
- if (mce_cap_init() < 0) {
+ if (mce_cap_init() < 0 || mce_cpu_quirks(c) < 0) {
mce_disabled = 1;
return;
}
- mce_cpu_quirks(c);
machine_check_vector = do_machine_check;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists