[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <366fc78e7b8c4474958b289eec31ed25@intel.com>
Date: Thu, 7 Jan 2021 00:26:19 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: "paulmck@...nel.org" <paulmck@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"bp@...en8.de" <bp@...en8.de>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"hpa@...or.com" <hpa@...or.com>,
"kernel-team@...com" <kernel-team@...com>
Subject: RE: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs
> Please see below for an updated patch.
Yes. That worked:
[ 78.946069] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143
[ 78.946151] mce: mce_timed_out: MCE holdout CPUs (may include false positives): 24-47,120-143
[ 78.946153] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
I guess that more than one CPU hit the timeout and so your new message was printed twice
before the panic code took over?
Once again, the whole of socket 1 is MIA rather than just the pair of threads on one of the cores there.
But that's a useful improvement (eliminating the other three sockets on this system).
Tested-by: Tony Luck <tony.luck@...el.com>
-Tony
Powered by blists - more mailing lists