lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z0TfblQeVRnDc-S1@gmail.com>
Date: Mon, 25 Nov 2024 21:34:54 +0100
From: Ingo Molnar <mingo@...nel.org>
To: David Woodhouse <dwmw2@...radead.org>
Cc: kexec@...ts.infradead.org, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
	"H. Peter Anvin" <hpa@...or.com>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Kai Huang <kai.huang@...el.com>,
	Nikolay Borisov <nik.borisov@...e.com>,
	linux-kernel@...r.kernel.org, Simon Horman <horms@...nel.org>,
	Dave Young <dyoung@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>, jpoimboe@...nel.org
Subject: Re: [RFC PATCH v2 16/16] [DO NOT MERGE] x86/kexec: enable DEBUG


* David Woodhouse <dwmw2@...radead.org> wrote:

> > Just curious: did you write this code to debug the series, or was 
> > there some original hair-tearing regression that motivated you? Is 
> > there's an upstream fix to marvel at and be horrified about in 
> > equal measure?
> 
> https://lore.kernel.org/all/2ab14f6f-2690-056b-cf9e-38a12dafd728@amd.com/t/#u
> is the upstream fix.

Which ended up being the following upstream commit:

  88a921aa3c6b ("x86/sev: Ensure that RMP table fixups are reserved")

Might make sense to add this commit reference to one of the central 
patches of the GDT/IDT code, to document how this feature is able to 
pin down very hard to debug regressions. (Even if the upstream fix was 
done independently in probably luckier circumstances.)

> [...] It's all the more horrifying because it was already *fixed* 
> upstream before I lost weeks of my life to chasing it. And the 
> trigger which actually made it *happen*, and made our production 
> systems allocate memory within that dangerous 1MiB region adjacent to 
> the RMP table, was a tweak to the NMI watchdog period... leading to 
> an assumption that we were getting stray perf NMIs during the kexec, 
> and a *long* wild goose chase based on that false assumption...

:-/

> Once I'd written the debug code, I just wanted to clean it up a bit 
> and push it out for the benefit of others; that *was* the main point 
> of this series. All the rest of the cleanups are just yak shaving.
> 
> The realisation that we never even explicitly mapped the control code 
> page and always just got lucky because it happened to be in the same 
> 2MiB or 1GiB superpage as something else that we did map... was just 
> a bonus :)

I'm amazed and horrified in equal measure ;-)

> (That one is fixed in v3 which I'll post shortly, and is already in 
> https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kexec-debug
> )
> 
> > I'd argue that this debugging code probably needs a default-off Kconfig 
> > option, even with the obvious hard-coded environmental limitations & 
> > assumptions it has. Could be useful to very early debugging & would 
> > preserve your effort without it bitrotting too obviously.
> 
> Yeah. In v3 I've made it a config option, and made it use the 
> early_printk serial console (as long as that's an I/O based 8250; we 
> can add others too later).

That's lovely!

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ