lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <31e17bc8-2e9e-4e93-a912-3d54826e59d0@intel.com>
Date: Thu, 17 Apr 2025 11:56:11 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
 "tglx@...utronix.de" <tglx@...utronix.de>,
 "peterz@...radead.org" <peterz@...radead.org>,
 "mingo@...hat.com" <mingo@...hat.com>, "Huang, Kai" <kai.huang@...el.com>,
 "bp@...en8.de" <bp@...en8.de>
Cc: "ashish.kalra@....com" <ashish.kalra@....com>,
 "seanjc@...gle.com" <seanjc@...gle.com>, "x86@...nel.org" <x86@...nel.org>,
 "sagis@...gle.com" <sagis@...gle.com>, "hpa@...or.com" <hpa@...or.com>,
 "Chatre, Reinette" <reinette.chatre@...el.com>,
 "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
 "Williams, Dan J" <dan.j.williams@...el.com>,
 "pbonzini@...hat.com" <pbonzini@...hat.com>,
 "thomas.lendacky@....com" <thomas.lendacky@....com>,
 "Yamahata, Isaku" <isaku.yamahata@...el.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "nik.borisov@...e.com" <nik.borisov@...e.com>
Subject: Re: [PATCH] x86/virt/tdx: Make TDX and kexec mutually exclusive at
 runtime

On 4/17/25 11:21, Edgecombe, Rick P wrote:
> On Thu, 2025-04-17 at 10:50 -0700, Dave Hansen wrote:
>> On 4/16/25 16:02, Kai Huang wrote:
>>> Full support for kexec on a TDX host would require complex work.
>>> The cache flushing required would need to happen while stopping
>>> remote CPUs, which would require changes to a fragile area of the
>>> kernel.
>>
>> Doesn't kexec already stop remote CPUs? Doesn't this boil down to a
>> WBINVD? How is that complex?
> 
> When SME added an SME-only WBINVD in stop_this_cpu() it caused a shutdown hang
> on some particular HW. It turns out there was an existing race that was made
> worse by the slower operation. It went through some attempts to fix it, and
> finally tglx patched it up with:
> 
>   1f5e7eb7868e ("x86/smp: Make stop_other_cpus() more robust")
> 
> But in that patch he said the fix "cannot plug all holes either". So while
> looking at doing the WBINVD for TDX kexec, I was advocating for giving this a
> harder look before building on top of it. The patches to add TDX kexec support
> made the WBINVD happen on all bare metal, not just TDX HW. So whatever races
> exist would be exposed to a much wider variety of HW than SME tested out.

I get it. Adding WBINVD to this same path caused some pain before. But
just turning off the feature that calls this path seems like overkill.

How about we try to push WBINVD out of this path? It should be quite
doable for TDX, I think.

Let's say we had a percpu bool. It get set when SME is enabled on the
system on each CPU. It also gets enabled when TDX is enabled. The kexec
code becomes:

-	if (SME)
+	if (per_cpu(newbool))
		wbinvd();

No TDX, no new wbinvd(). If SME, no change.

Now, here's where it gets fun. The bool can get _cleared_ after WBINVD
is executed on a CPU, at least on TDX systems. It then also needs to get
set after TDX might dirty a cacheline.

	TDCALL(); // dirties stuff
	per_cpu(newbool) = 1;

Then you can also do this on_each_cpu():

	wbinvd();
	per_cpu(newbool) = 0;

hopefully at point after you're sure no more TDCALLs are being made. If
you screw it up, no biggie: the kexec-time one will make up for it,
exposing TDX systems to the kexec timing bugs. But if the on_each_cpu()
thing works in the common case, you get no additional bug exposure.

>>> It would also require resetting TDX private pages, which is non-
>>> trivial since the core kernel does not track them.
>>
>> Why? The next kernel will just use KeyID-0 which will blast the old
>> pages away with no side effects... right?
> 
> I believe this is talking about support to work around the #MC errata. Another
> version of kexec TDX support used a KVM callback to have it reset all the TDX
> guest memory it knows about.

So, let's just not support hardware with that erratum upstream.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ