[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1439396985-12812-10-git-send-email-bp@alien8.de>
Date: Wed, 12 Aug 2015 18:29:41 +0200
From: Borislav Petkov <bp@...en8.de>
To: Ingo Molnar <mingo@...nel.org>
Cc: Tony Luck <tony.luck@...el.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: [PATCH 09/13] x86/mce: Reenable CMCI banks when swiching back to interrupt mode
From: Xie XiuQi <xiexiuqi@...wei.com>
Zhang Liguang reported the following issue:
1) System detects a CMCI storm on the current CPU.
2) Kernel disables the CMCI interrupt on banks owned by the current CPU and
switches to poll mode
3) After the CMCI storm subsides, kernel switches back to interrupt mode
4) We expect the system to reenable the CMCI interrupt on banks owned by
the current CPU
mce_intel_adjust_timer
|-> cmci_reenable
|-> cmci_discover # owned banks are ignored here
static void cmci_discover(int banks)
...
for (i = 0; i < banks; i++) {
...
if (test_bit(i, owned)) # ownd banks is ignore here
continue;
So convert cmci_storm_disable_banks() to cmci_toggle_interrupt_mode()
which controls whether to enable or disable CMCI interrupts with its
argument.
NB: We cannot clear the owned bit because the banks won't be polled,
otherwise. See
27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")
for more info.
Reported-by: Zhang Liguang <zhangliguang@...wei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@...wei.com>
Cc: <stable@...r.kernel.org> # v3.15+
Cc: "H. Peter Anvin" <hpa@...or.com>
Cc: huawei.libin@...wei.com
Cc: Ingo Molnar <mingo@...hat.com>
Cc: linux-edac <linux-edac@...r.kernel.org>
Cc: rui.xiang@...wei.com
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Tony Luck <tony.luck@...el.com>
Cc: x86-ml <x86@...nel.org>
Link: http://lkml.kernel.org/r/1439347871-2702-1-git-send-email-xiexiuqi@huawei.com
Signed-off-by: Borislav Petkov <bp@...e.de>
---
arch/x86/kernel/cpu/mcheck/mce_intel.c | 41 +++++++++++++++++++---------------
1 file changed, 23 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index c5c003291861..1e8bb6c94f14 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -146,6 +146,27 @@ void mce_intel_hcpu_update(unsigned long cpu)
per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
}
+static void cmci_toggle_interrupt_mode(bool on)
+{
+ unsigned long flags, *owned;
+ int bank;
+ u64 val;
+
+ raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+ owned = this_cpu_ptr(mce_banks_owned);
+ for_each_set_bit(bank, owned, MAX_NR_BANKS) {
+ rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+
+ if (on)
+ val |= MCI_CTL2_CMCI_EN;
+ else
+ val &= ~MCI_CTL2_CMCI_EN;
+
+ wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+ }
+ raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+}
+
unsigned long cmci_intel_adjust_timer(unsigned long interval)
{
if ((this_cpu_read(cmci_backoff_cnt) > 0) &&
@@ -175,7 +196,7 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
*/
if (!atomic_read(&cmci_storm_on_cpus)) {
__this_cpu_write(cmci_storm_state, CMCI_STORM_NONE);
- cmci_reenable();
+ cmci_toggle_interrupt_mode(true);
cmci_recheck();
}
return CMCI_POLL_INTERVAL;
@@ -186,22 +207,6 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
}
}
-static void cmci_storm_disable_banks(void)
-{
- unsigned long flags, *owned;
- int bank;
- u64 val;
-
- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
- owned = this_cpu_ptr(mce_banks_owned);
- for_each_set_bit(bank, owned, MAX_NR_BANKS) {
- rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
- val &= ~MCI_CTL2_CMCI_EN;
- wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
- }
- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
-}
-
static bool cmci_storm_detect(void)
{
unsigned int cnt = __this_cpu_read(cmci_storm_cnt);
@@ -223,7 +228,7 @@ static bool cmci_storm_detect(void)
if (cnt <= CMCI_STORM_THRESHOLD)
return false;
- cmci_storm_disable_banks();
+ cmci_toggle_interrupt_mode(false);
__this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
r = atomic_add_return(1, &cmci_storm_on_cpus);
mce_timer_kick(CMCI_STORM_INTERVAL);
--
2.5.0.rc2.28.g6003e7f
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists