lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20151208185644.GE27180@pd.tnic>
Date:	Tue, 8 Dec 2015 19:56:44 +0100
From:	Borislav Petkov <bp@...en8.de>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	"Raj, Ashok" <ashok.raj@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
Subject: Re: [Patch V2] x86, mce: Ensure offline CPU's don't participate in
 mce rendezvous process.

On Tue, Dec 08, 2015 at 03:59:58PM +0000, Luck, Tony wrote:
> > No, the system did panic in both times. The "strange" observation is
> > that the MCE gets reported only on the cores on node 0. Or at least only
> > the printks from mce_panic() on the cores on node0 reach the serial
> > console.
> 
> You only see messages and logs from node0, because the cpus there are
> the only ones that see any errors logged in their banks.
> 
> The cpus on node 1, 2, 3 scan all banks and find nothing, so say nothing.

Right, sure, of course. Doh!

Confirmation:

[  183.840517] mce: do_machine_check: CPU: 30
[  183.840531] mce: do_machine_check: CPU: 27
[  183.840536] mce: do_machine_check: CPU: 29
[  183.840541] mce: do_machine_check: CPU: 56
[  183.840546] mce: do_machine_check: CPU: 28
[  183.840548] mce: do_machine_check: CPU: 60
[  183.840550] mce: do_machine_check: CPU: 24
[  183.840557] mce: do_machine_check: CPU: 12
[  183.840561] mce: do_machine_check: CPU: 45
[  183.840565] mce: do_machine_check: CPU: 59
[  183.840569] mce: do_machine_check: CPU: 57
[  183.840572] mce: do_machine_check: CPU: 61
[  183.840584] mce: do_machine_check: CPU: 0
[  183.840587] mce: do_machine_check: CPU: 32
[  183.840593] mce: do_machine_check: CPU: 63
[  183.840596] mce: do_machine_check: CPU: 31
[  183.840602] mce: do_machine_check: CPU: 42
[  183.840606] mce: do_machine_check: CPU: 11
[  183.840611] mce: do_machine_check: CPU: 41
[  183.840613] mce: do_machine_check: CPU: 9
[  183.840617] mce: do_machine_check: CPU: 62
[  183.840619] mce: do_machine_check: CPU: 25
[  183.840624] mce: do_machine_check: CPU: 58
[  183.840627] mce: do_machine_check: CPU: 26
[  183.840633] mce: do_machine_check: CPU: 5
[  183.840638] mce: do_machine_check: CPU: 1
[  183.840642] mce: do_machine_check: CPU: 37
[  183.840648] mce: do_machine_check: CPU: 15
[  183.840650] mce: do_machine_check: CPU: 47
[  183.840653] mce: do_machine_check: CPU: 44
[  183.840657] mce: do_machine_check: CPU: 14
[  183.840659] mce: do_machine_check: CPU: 46
[  183.840666] mce: do_machine_check: CPU: 52
[  183.840670] mce: do_machine_check: CPU: 50
[  183.840675] mce: do_machine_check: CPU: 48
[  183.840677] mce: do_machine_check: CPU: 16
[  183.840682] mce: do_machine_check: CPU: 54
[  183.840686] mce: do_machine_check: CPU: 18
[  183.840692] mce: do_machine_check: CPU: 40
[  183.840695] mce: do_machine_check: CPU: 8
[  183.840701] mce: do_machine_check: CPU: 2
[  183.840705] mce: do_machine_check: CPU: 20
[  183.840710] mce: do_machine_check: CPU: 13
[  183.840712] mce: do_machine_check: CPU: 43
[  183.840716] mce: do_machine_check: CPU: 10
[  183.840722] mce: do_machine_check: CPU: 3
[  183.840724] mce: do_machine_check: CPU: 35
[  183.840727] mce: do_machine_check: CPU: 33
[  183.840730] mce: do_machine_check: CPU: 34
[  183.840734] mce: do_machine_check: CPU: 6
[  183.840738] mce: do_machine_check: CPU: 38
[  183.840743] mce: do_machine_check: CPU: 53
[  183.840745] mce: do_machine_check: CPU: 21
[  183.840750] mce: do_machine_check: CPU: 23
[  183.840752] mce: do_machine_check: CPU: 55
[  183.840755] mce: do_machine_check: CPU: 22
[  183.840759] mce: do_machine_check: CPU: 49
[  183.840761] mce: do_machine_check: CPU: 17
[  183.840767] mce: do_machine_check: CPU: 19
[  183.840770] mce: do_machine_check: CPU: 51
[  183.840776] mce: do_machine_check: CPU: 39
[  183.840778] mce: do_machine_check: CPU: 7
[  183.840784] mce: do_machine_check: CPU: 36
[  183.840786] mce: do_machine_check: CPU: 4
[  184.485104] Disabling lock debugging due to kernel taint
[  184.498006] mce: [Hardware Error]: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090
[  184.498023] mce: [Hardware Error]: Machine check events logged
[  184.531428] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8135de7f> {intel_idle+0xbf/0x130}
[  184.551126] mce: [Hardware Error]: TSC c760ad064ccce ADDR bb68ec00 MISC 421c8c86 
[  184.568358] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1449600598 SOCKET 0 APIC 1 microcode 710
[  184.588862] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
...

mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 1: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 2: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 32: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 33: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 35: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 36: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 37: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 38: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 39: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 3: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 4: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 5: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 6: Machine Check Exception: 5 Bank 5: be00000000010090
mce: [Hardware Error]: CPU 7: Machine Check Exception: 5 Bank 5: be00000000010090

CPUs:

[    1.103200] x86: Booting SMP configuration:
[    1.112441] .... node  #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    1.227835] .... node  #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.451861] .... node  #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.674819] .... node  #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    1.899011] .... node  #0, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.026616] .... node  #1, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    2.152645] .... node  #2, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    2.276782] .... node  #3, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    2.402263] x86: Booted up 4 nodes, 64 CPUs

Ok, all clear.

Thanks!

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ