linux-kernel - RE: [PATCH] mce: fix missing stack-dumping in mce

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <SJ1PR11MB60831AB2202FF0C3CF99EF1DFCF49@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date:   Tue, 3 Jan 2023 21:12:02 +0000
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Miaohe Lin <linmiaohe@...wei.com>, Borislav Petkov <bp@...en8.de>
CC:     "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>
Subject: RE: [PATCH] mce: fix missing stack-dumping in mce_panic()

>> I guess the original issue that commit was fixing is to save that
>> redundant oops message but Tony seems to want to see it now and I'm not
>> sure how much we care about 80x50 screens nowadays... :-)

I want a stack dump for the specific case of a recoverable machine check caused by
poison consumption in kernel code that doesn't have an extable[] entry for a recovery
path. That's a potential candidate for future kernel change to make that recoverable
(if the code path seems common enough to warrant the churn), and there is some
plausible way for s/w to "recover").

For most other machine checks the dump is very likely useless. E.g. some CPU core stalled
so that the system generates a broadcast machine check because instructions are not
being retired on that CPU core. In this case the machine check "monarch" is almost certainly
some innocent bystander that was executing normally. Stack dump from that CPU is going
to tell you nothing about the machine check.

-Tony