lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1404925766-32253-1-git-send-email-hskinnemoen@google.com>
Date:	Wed,  9 Jul 2014 10:09:20 -0700
From:	Havard Skinnemoen <hskinnemoen@...gle.com>
To:	Tony Luck <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>
Cc:	linux-kernel@...r.kernel.org,
	Havard Skinnemoen <hskinnemoen@...gle.com>,
	Ewout van Bekkum <ewout@...gle.com>
Subject: [PATCH 0/6] x86 mce fixes

The following series contains a few fixes we came up with while testing the MCE
handling on our servers in the lab. These should fix the following symptoms:

  - Once entering CMCI storm mode, we would never exit. This was because we set
    the check_interval to be shorter than 30 seconds, so the condition to exit
    storm mode could never become true.
  - After a storm, the MCE banks previously handled by a CPU could not be
    reclaimed.
  - After a kexec reboot, none of the MCE banks could be claimed by any CPU.
  - Duplicate MCEs were being reported in some circumstances (e.g. with
    mce=no_cmci and/or mce=3).
  - Crashes because the polling timer was added multiple times.

We're not sure if these patches are the best way to fix these issues, and they
may introduce new, subtle bugs, but it's the best we managed to come up with.
Please take a good look and tell us what we got wrong.

Ewout did all the leg work in getting this implemented and tested, while I've
been providing advice and reviews.

Signed-off-by: Ewout van Bekkum <ewout@...gle.com>
Signed-off-by: Havard Skinnemoen <hskinnemoen@...gle.com>

Ewout van Bekkum (6):
  x86-mce: Modify CMCI poll interval to adjust for small check_interval
    values.
  x86-mce: Modify CMCI storm exit to reenable instead of rediscover
    banks.
  x86-mce: Clear CMCI enable on all claimed CMCI banks before reboot.
  x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports.
  x86-mce: check if no_way_out applies before deciding not to clear MCE
    banks.
  x86-mce: ensure the MCP timer is not already set in the mce_timer_fn.

 arch/x86/kernel/cpu/mcheck/mce-internal.h |  2 +
 arch/x86/kernel/cpu/mcheck/mce.c          | 39 +++++++++++--
 arch/x86/kernel/cpu/mcheck/mce_intel.c    | 95 ++++++++++++++++++++++++-------
 3 files changed, 111 insertions(+), 25 deletions(-)

-- 
2.0.0.526.g5318336

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ