lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b63056f9-0709-736b-ea5b-5e903410cb1d@huawei.com>
Date:   Fri, 6 Jan 2023 09:57:12 +0800
From:   Miaohe Lin <linmiaohe@...wei.com>
To:     "Luck, Tony" <tony.luck@...el.com>
CC:     "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH] mce: fix missing stack-dumping in mce_panic()

On 2023/1/4 5:12, Luck, Tony wrote:
>>> I guess the original issue that commit was fixing is to save that
>>> redundant oops message but Tony seems to want to see it now and I'm not
>>> sure how much we care about 80x50 screens nowadays... :-)
> 

Many thanks for your thought. :)

> I want a stack dump for the specific case of a recoverable machine check caused by
> poison consumption in kernel code that doesn't have an extable[] entry for a recovery
> path. That's a potential candidate for future kernel change to make that recoverable

Sure, in this case a stack dump will be really helpful. We can gather these stack dumps
and try to make the most frequent machine check scene recoverable.

> (if the code path seems common enough to warrant the churn), and there is some
> plausible way for s/w to "recover").
> 
> For most other machine checks the dump is very likely useless. E.g. some CPU core stalled
> so that the system generates a broadcast machine check because instructions are not
> being retired on that CPU core. In this case the machine check "monarch" is almost certainly
> some innocent bystander that was executing normally. Stack dump from that CPU is going
> to tell you nothing about the machine check.

Agree. A stack dump won't be helpful in this case. But I tend to keep the stack dump in case
it would be helpful and also make mce_panic() dumps the stack as expected. What do you think?

Thanks,
Miaohe Lin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ