lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ccf5413-ce2a-bbb9-0203-b0cc9fe2e641@profihost.ag>
Date:   Mon, 29 Oct 2018 11:45:04 +0100
From:   Daniel Aberger - Profihost AG <d.aberger@...fihost.ag>
To:     tony.luck@...el.com, bp@...en8.de, mingo@...hat.com, hpa@...or.com,
        x86@...nel.org, linux-edac@...r.kernel.org,
        linux-kernel@...r.kernel.org
Cc:     s.priebe@...fihost.ag, n.fahldieck@...fihost.ag,
        p.kramme@...fihost.ag
Subject: MCE reports errors that can't be verified

Hello,

We currently have several servers reporting faulty memory through MCE.

Example dmesg output:

[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: Machine check events logged
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: CPU 12: Machine Check:
0 Bank 7: cc027c0000010091
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: TSC 0 ADDR 70fc337d80
MISC 50202086
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: PROCESSOR 0:306f2 TIME
1534938887 SOCKET 1 APIC 20 microcode 3d
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: Machine check events logged
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: CPU 14: Machine Check:
0 Bank 7: 8c00004000010091
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: TSC 0 ADDR 70fb117d40
MISC 4268e886
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: PROCESSOR 0:306f2 TIME
1534938887 SOCKET 1 APIC 24 microcode 3d
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: CPU 15: Machine Check:
0 Bank 7: cc00008000010091
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: TSC 0 ADDR 70fb1b3ec0
MISC 142189886
[Mi Aug 22 13:54:47 2018] mce: [Hardware Error]: PROCESSOR 0:306f2 TIME
1534938887 SOCKET 1 APIC 26 microcode 3d

Normally we verify those errors by checking IPMIs event log, but no
errors are showing there.

Neither IPMI nor ras-mc-ctl report any errors.

We encountered this problem running Kernel 4.12.0 based on openSUSE
SLE15 on commit a906b62b3f80679eac4f38373492a871c5f3568e.

Is this an MCE Kernel bug?


-- 
Mit freundlichen Grüßen
  Daniel Aberger
Ihr Profihost Team

-------------------------------
Profihost AG
Expo Plaza 1
30539 Hannover
Deutschland

Tel.: +49 (511) 5151 8181     | Fax.: +49 (511) 5151 8282
URL: http://www.profihost.com | E-Mail: info@...fihost.com

Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827
Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350
Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe
Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ