linux-kernel - Re: [PATCH] x86/virt/tdx: Make TDX and kexec mutually exclusive at runtime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ea0b0b1a842ad1fc209438c776f68ffb4ac17b9f.camel@intel.com>
Date: Thu, 17 Apr 2025 18:21:13 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "tglx@...utronix.de" <tglx@...utronix.de>, "peterz@...radead.org"
	<peterz@...radead.org>, "mingo@...hat.com" <mingo@...hat.com>, "Hansen, Dave"
	<dave.hansen@...el.com>, "Huang, Kai" <kai.huang@...el.com>, "bp@...en8.de"
	<bp@...en8.de>
CC: "ashish.kalra@....com" <ashish.kalra@....com>, "seanjc@...gle.com"
	<seanjc@...gle.com>, "x86@...nel.org" <x86@...nel.org>, "sagis@...gle.com"
	<sagis@...gle.com>, "hpa@...or.com" <hpa@...or.com>, "Chatre, Reinette"
	<reinette.chatre@...el.com>, "kirill.shutemov@...ux.intel.com"
	<kirill.shutemov@...ux.intel.com>, "Williams, Dan J"
	<dan.j.williams@...el.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"thomas.lendacky@....com" <thomas.lendacky@....com>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "nik.borisov@...e.com" <nik.borisov@...e.com>
Subject: Re: [PATCH] x86/virt/tdx: Make TDX and kexec mutually exclusive at
 runtime

On Thu, 2025-04-17 at 10:50 -0700, Dave Hansen wrote:
> On 4/16/25 16:02, Kai Huang wrote:
> > Full support for kexec on a TDX host would require complex work.
> > The cache flushing required would need to happen while stopping
> > remote CPUs, which would require changes to a fragile area of the
> > kernel.
> 
> Doesn't kexec already stop remote CPUs? Doesn't this boil down to a
> WBINVD? How is that complex?

When SME added an SME-only WBINVD in stop_this_cpu() it caused a shutdown hang
on some particular HW. It turns out there was an existing race that was made
worse by the slower operation. It went through some attempts to fix it, and
finally tglx patched it up with:

  1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust")

But in that patch he said the fix "cannot plug all holes either". So while
looking at doing the WBINVD for TDX kexec, I was advocating for giving this a
harder look before building on top of it. The patches to add TDX kexec support
made the WBINVD happen on all bare metal, not just TDX HW. So whatever races
exist would be exposed to a much wider variety of HW than SME tested out.

> 
> > It would also require resetting TDX private pages, which is non-
> > trivial since the core kernel does not track them.
> 
> Why? The next kernel will just use KeyID-0 which will blast the old
> pages away with no side effects... right?

I believe this is talking about support to work around the #MC errata. Another
version of kexec TDX support used a KVM callback to have it reset all the TDX
guest memory it knows about.

> 
> > Lastly, it would have to rely on a yet-to-be documented behavior
> > around the TME key (KeyID 0).
> 
> I'll happily wait for the documentation if you insist on it (I don't).

Ok, thanks. This one is probably more of a bonus reason on top of the above.