[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ1PR11MB60831AB2202FF0C3CF99EF1DFCF49@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date: Tue, 3 Jan 2023 21:12:02 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: Miaohe Lin <linmiaohe@...wei.com>, Borislav Petkov <bp@...en8.de>
CC: "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>
Subject: RE: [PATCH] mce: fix missing stack-dumping in mce_panic()
>> I guess the original issue that commit was fixing is to save that
>> redundant oops message but Tony seems to want to see it now and I'm not
>> sure how much we care about 80x50 screens nowadays... :-)
I want a stack dump for the specific case of a recoverable machine check caused by
poison consumption in kernel code that doesn't have an extable[] entry for a recovery
path. That's a potential candidate for future kernel change to make that recoverable
(if the code path seems common enough to warrant the churn), and there is some
plausible way for s/w to "recover").
For most other machine checks the dump is very likely useless. E.g. some CPU core stalled
so that the system generates a broadcast machine check because instructions are not
being retired on that CPU core. In this case the machine check "monarch" is almost certainly
some innocent bystander that was executing normally. Stack dump from that CPU is going
to tell you nothing about the machine check.
-Tony
Powered by blists - more mailing lists