[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y71XPl8br2QU2L8E@zn.tnic>
Date: Tue, 10 Jan 2023 13:17:02 +0100
From: Borislav Petkov <bp@...en8.de>
To: Ingo Molnar <mingo@...nel.org>
Cc: Zeng Heng <zengheng4@...wei.com>, michael.roth@....com,
hpa@...or.com, tglx@...utronix.de,
sathyanarayanan.kuppuswamy@...ux.intel.com,
kirill.shutemov@...ux.intel.com, jroedel@...e.de,
keescook@...omium.org, mingo@...hat.com,
dave.hansen@...ux.intel.com, linux-kernel@...r.kernel.org,
x86@...nel.org, liwei391@...wei.com,
Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH -v2] x86/boot/compressed: Register dummy NMI handler in
EFI boot loader, to avoid kdump crashes
On Tue, Jan 10, 2023 at 01:11:29PM +0100, Borislav Petkov wrote:
> On Tue, Jan 10, 2023 at 01:01:06PM +0100, Ingo Molnar wrote:
> > From: Zeng Heng <zengheng4@...wei.com>
> > Date: Tue, 10 Jan 2023 18:27:45 +0800
> > Subject: [PATCH] x86/boot/compressed: Register dummy NMI handler in EFI boot loader, to avoid kdump crashes
> >
> > If kdump is enabled, when using mce_inject to inject errors, EFI
>
> Why does "EFI" matter here? Any boot loader would do...
>
> > boot loader would decompress & load second kernel for saving the
>
> s/&/and/
>
> > vmcore file.
> >
> > For normal errors that is fine.
>
> Useless sentence.
>
> > However, in the MCE case, the panic
> > CPU that firstly enters into mce_panic() is running within NMI
> > interrupt context,
>
> "#MC context" it is non-maskable but that's not "NMI interrupt context"
>
> > and the processor blocks delivery of subsequent
> > NMIs until the next execution of the IRET instruction.
> >
> > When the panic CPU takes long time in the panic processing route,
>
> I'm still unclear on the order of events here. It sounds like
>
> 1. MCE injected
> 2. panic
> 3. kdump gets loaded
>
> If that is the case, then I presume the flow is:
>
> mce_panic -> panic -> __crash_kexec()
>
> Yes?
>
> If so, then we should make sure we have *exited* #MC context before calling
> panic() and not have to add hacks like this one of adding an empty NMI handler.
>
> But I'm only speculating as it is hard to make sense of all this text.
IOW, does this help?
---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 7832a69d170e..55437d8a4fad 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -287,6 +287,7 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
if (panic_timeout == 0)
panic_timeout = mca_cfg.panic_timeout;
panic(msg);
+ mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
} else
pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists