linux-kernel - Re: [PATCH v3] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231017111817.GAZS5teT4rFkXVD2KA@fat_crate.local>
Date:   Tue, 17 Oct 2023 13:18:17 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     "Li, Zhiquan1" <zhiquan1.li@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "patches@...ts.linux.dev" <patches@...ts.linux.dev>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "naoya.horiguchi@....com" <naoya.horiguchi@....com>
Subject: Re: [PATCH v3] x86/mce: Set PG_hwpoison page flag to avoid the
 capture kernel panic

On Tue, Oct 17, 2023 at 01:24:53AM +0000, Luck, Tony wrote:
> How about:
>
> When there is a fatal machine check Linux calls mce_panic()
> without checking to see if bad data at some memory address
> was reported in the machine check banks.

... for the simple reason that the kernel cannot allow itself to do any
unnecessary work but panic immediately so that it can stop the
propagation of bad data.

Now, it's a whole different story whether that's the right thing to do
and whether the data has already propagated so that the panic is moot.

The whole point I'm trying to make is that the machine panics because
the error severity dictates it to do so. And there's no opportunity to
queue recovery work because it simply cannot in that case. So the commit
message should simply state that we're marking the page as poison for
the kexec'ed kernel's sake and not because of anything else.

> If kexec is enabled, check for memory errors and mark the
> page as poisoned so that the kexec'd kernel can avoid accessing
> the page.

Yap, yours makes sense.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette