[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b63056f9-0709-736b-ea5b-5e903410cb1d@huawei.com>
Date: Fri, 6 Jan 2023 09:57:12 +0800
From: Miaohe Lin <linmiaohe@...wei.com>
To: "Luck, Tony" <tony.luck@...el.com>
CC: "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH] mce: fix missing stack-dumping in mce_panic()
On 2023/1/4 5:12, Luck, Tony wrote:
>>> I guess the original issue that commit was fixing is to save that
>>> redundant oops message but Tony seems to want to see it now and I'm not
>>> sure how much we care about 80x50 screens nowadays... :-)
>
Many thanks for your thought. :)
> I want a stack dump for the specific case of a recoverable machine check caused by
> poison consumption in kernel code that doesn't have an extable[] entry for a recovery
> path. That's a potential candidate for future kernel change to make that recoverable
Sure, in this case a stack dump will be really helpful. We can gather these stack dumps
and try to make the most frequent machine check scene recoverable.
> (if the code path seems common enough to warrant the churn), and there is some
> plausible way for s/w to "recover").
>
> For most other machine checks the dump is very likely useless. E.g. some CPU core stalled
> so that the system generates a broadcast machine check because instructions are not
> being retired on that CPU core. In this case the machine check "monarch" is almost certainly
> some innocent bystander that was executing normally. Stack dump from that CPU is going
> to tell you nothing about the machine check.
Agree. A stack dump won't be helpful in this case. But I tend to keep the stack dump in case
it would be helpful and also make mce_panic() dumps the stack as expected. What do you think?
Thanks,
Miaohe Lin
Powered by blists - more mailing lists