[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z0TfblQeVRnDc-S1@gmail.com>
Date: Mon, 25 Nov 2024 21:34:54 +0100
From: Ingo Molnar <mingo@...nel.org>
To: David Woodhouse <dwmw2@...radead.org>
Cc: kexec@...ts.infradead.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Kai Huang <kai.huang@...el.com>,
Nikolay Borisov <nik.borisov@...e.com>,
linux-kernel@...r.kernel.org, Simon Horman <horms@...nel.org>,
Dave Young <dyoung@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, jpoimboe@...nel.org
Subject: Re: [RFC PATCH v2 16/16] [DO NOT MERGE] x86/kexec: enable DEBUG
* David Woodhouse <dwmw2@...radead.org> wrote:
> > Just curious: did you write this code to debug the series, or was
> > there some original hair-tearing regression that motivated you? Is
> > there's an upstream fix to marvel at and be horrified about in
> > equal measure?
>
> https://lore.kernel.org/all/2ab14f6f-2690-056b-cf9e-38a12dafd728@amd.com/t/#u
> is the upstream fix.
Which ended up being the following upstream commit:
88a921aa3c6b ("x86/sev: Ensure that RMP table fixups are reserved")
Might make sense to add this commit reference to one of the central
patches of the GDT/IDT code, to document how this feature is able to
pin down very hard to debug regressions. (Even if the upstream fix was
done independently in probably luckier circumstances.)
> [...] It's all the more horrifying because it was already *fixed*
> upstream before I lost weeks of my life to chasing it. And the
> trigger which actually made it *happen*, and made our production
> systems allocate memory within that dangerous 1MiB region adjacent to
> the RMP table, was a tweak to the NMI watchdog period... leading to
> an assumption that we were getting stray perf NMIs during the kexec,
> and a *long* wild goose chase based on that false assumption...
:-/
> Once I'd written the debug code, I just wanted to clean it up a bit
> and push it out for the benefit of others; that *was* the main point
> of this series. All the rest of the cleanups are just yak shaving.
>
> The realisation that we never even explicitly mapped the control code
> page and always just got lucky because it happened to be in the same
> 2MiB or 1GiB superpage as something else that we did map... was just
> a bonus :)
I'm amazed and horrified in equal measure ;-)
> (That one is fixed in v3 which I'll post shortly, and is already in
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kexec-debug
> )
>
> > I'd argue that this debugging code probably needs a default-off Kconfig
> > option, even with the obvious hard-coded environmental limitations &
> > assumptions it has. Could be useful to very early debugging & would
> > preserve your effort without it bitrotting too obviously.
>
> Yeah. In v3 I've made it a config option, and made it use the
> early_printk serial console (as long as that's an I/O based 8250; we
> can add others too later).
That's lovely!
Thanks,
Ingo
Powered by blists - more mailing lists