lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 7 Jan 2021 00:26:19 +0000
From:   "Luck, Tony" <tony.luck@...el.com>
To:     "paulmck@...nel.org" <paulmck@...nel.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "bp@...en8.de" <bp@...en8.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "kernel-team@...com" <kernel-team@...com>
Subject: RE: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs

> Please see below for an updated patch.

Yes. That worked:

[   78.946069] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143
[   78.946151] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143
[   78.946153] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler

I guess that more than one CPU hit the timeout and so your new message was printed twice
before the panic code took over?

Once again, the whole of socket 1 is MIA rather than just the pair of threads on one of the cores there.
But that's a useful improvement (eliminating the other three sockets on this system).

Tested-by: Tony Luck <tony.luck@...el.com>

-Tony

Powered by blists - more mailing lists