linux-kernel - Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170123175130.l7c7mnmu74ln5v6h@pd.tnic>
Date:   Mon, 23 Jan 2017 18:51:30 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     xlpang@...hat.com, x86@...nel.org, linux-kernel@...r.kernel.org,
        kexec@...ts.infradead.org, Ingo Molnar <mingo@...hat.com>,
        Dave Young <dyoung@...hat.com>,
        Prarit Bhargava <prarit@...hat.com>,
        Junichi Nomura <j-nomura@...jp.nec.com>,
        Kiyoshi Ueda <k-ueda@...jp.nec.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after
 system panic

Hey Tony,

a "welcome back" is in order? :-)

On Mon, Jan 23, 2017 at 09:40:09AM -0800, Luck, Tony wrote:
> If the system had experienced some memory corruption, but
> recovered ... then there would be some pages sitting around
> that the old kernel had marked as POISON and stopped using.
> The kexec'd kernel doesn't know about these, so may touch that
> memory while taking a crash dump ...

Hmm, pass a list of poisoned pages to the kdump kernel so as not to
touch. Looks like there's already functionality for that:

"makedumpfile can exclude the following types of pages while copying
VMCORE to DUMPFILE, and a user can choose which type of pages will be
excluded.

- Pages filled with zero
- Cache pages
- User process data pages
- Free pages"

 (there is a makedumpfile manpage somewhere)

And apparently crash knows about poisoned pages and handles them:

static int __init crash_save_vmcoreinfo_init(void)
{
	...
#ifdef CONFIG_MEMORY_FAILURE
        VMCOREINFO_NUMBER(PG_hwpoison);
#endif

so if that works, the kexeced kernel should know about that list.

> and then you have a broadcast machine check (on older[1] Intel CPUs
> that don't support local machine check).

Right.

> This is hard to work around. You really need all the CPUs to have set
> CR4.MCE=1 (if any didn't, then they will force a reset when they see
> the machine check). Also you need to make sure that they jump to the
> copy of do_machine_check() in the new kernel, not the old kernel.

Doesn't matter, right? The new copy is as clueless as the old one about
those MCEs.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.