linux-kernel - Re: [PATCH v3 2/6] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <554c46b80d9cc6c6dce651f839ac3998e9e605e1.camel@intel.com>
Date: Thu, 26 Jun 2025 18:37:17 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "Hansen, Dave" <dave.hansen@...el.com>, "Huang, Kai"
	<kai.huang@...el.com>, "bp@...en8.de" <bp@...en8.de>, "peterz@...radead.org"
	<peterz@...radead.org>, "hpa@...or.com" <hpa@...or.com>, "mingo@...hat.com"
	<mingo@...hat.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"thomas.lendacky@....com" <thomas.lendacky@....com>
CC: "seanjc@...gle.com" <seanjc@...gle.com>, "x86@...nel.org"
	<x86@...nel.org>, "sagis@...gle.com" <sagis@...gle.com>, "Chatre, Reinette"
	<reinette.chatre@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "kirill.shutemov@...ux.intel.com"
	<kirill.shutemov@...ux.intel.com>, "Williams, Dan J"
	<dan.j.williams@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "ashish.kalra@....com" <ashish.kalra@....com>,
	"Chen, Farrah" <farrah.chen@...el.com>, "nik.borisov@...e.com"
	<nik.borisov@...e.com>
Subject: Re: [PATCH v3 2/6] x86/virt/tdx: Mark memory cache state incoherent
 when making SEAMCALL

On Thu, 2025-06-26 at 22:48 +1200, Kai Huang wrote:
> On TDX platforms, at hardware level dirty cachelines with and without
> TDX keyID can coexist, and CPU can flush them back to memory in random
> order.
> 

Same comment as the previous patch.

>   During kexec, the caches must be flushed before jumping to the
> new kernel to avoid silent memory corruption when a cacheline with a
> different encryption property is written back over whatever encryption
> properties the new kernel is using.
> 
> A percpu boolean is used to mark whether the cache of a given CPU may be
> in an incoherent state, and the kexec performs WBINVD on the CPUs with
> that boolean turned on.
> 
> For TDX, only the TDX module or the TDX guests can generate dirty
> cachelines of TDX private memory, i.e., they are only generated when the
> kernel does SEAMCALL.
             ^a
> 
> Turn on that boolean when the kernel does SEAMCALL so that kexec can
Nit: "Turn on" is a little ambiguous. "Set"?

> correctly flush cache.
> 
> SEAMCALL can be made from both task context and IRQ disabled context.

SEAMCALLs

> Given SEAMCALL is just a lengthy instruction (e.g., thousands of cycles)
> from kernel's point of view and preempt_{disable|enable}() is cheap
> compared to it, simply unconditionally disable preemption during setting
> the percpu boolean and making SEAMCALL.
> 
> Signed-off-by: Kai Huang <kai.huang@...el.com>
> Tested-by: Farrah Chen <farrah.chen@...el.com>
> ---
> 
> v2 -> v3:
>  - Change to use __always_inline for do_seamcall() to avoid indirect
>    call instructions of making SEAMCALL.

How did this come about?

>  - Remove the senstence "not all SEAMCALLs generate dirty cachelines of
>    TDX private memory but just treat all of them do." in changelog and
>    the code comment. -- Dave
> 
> ---
>  arch/x86/include/asm/tdx.h | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 7ddef3a69866..d4c624c69d7f 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -102,10 +102,37 @@ u64 __seamcall_ret(u64 fn, struct tdx_module_args *args);
>  u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args);
>  void tdx_init(void);
>  
> +#include <linux/preempt.h>
>  #include <asm/archrandom.h>
> +#include <asm/processor.h>
>  
>  typedef u64 (*sc_func_t)(u64 fn, struct tdx_module_args *args);
>  
> +static __always_inline u64 do_seamcall(sc_func_t func, u64 fn,
> +				       struct tdx_module_args *args)
> +{

So now we have:

seamcall()
	sc_retry()
		do_seamcall()
			__seamcall()


do_seamcall() is only called from sc_retry(). Why add yet another helper in the
stack? You could just build it into sc_retry().

Oh, and __seamcall_*() variety is called directly too, so skips the
do_seamcall() per-cpu var logic in those cases. So, maybe do_seamcall() is
needed, but it needs a better name considering where it would get called from.

These wrappers need an overhaul I think, but maybe for now just have
do_dirty_seamcall() which is called from tdh_vp_enter() and sc_retry().

Oh no, actually scratch that! The inline/flatten issue will happen again if we
add the per-cpu vars to tdh_vp_enter()... Which means we probably need to set
the per-cpu var in tdx_vcpu_enter_exit(). And the other __seamcall() caller is
the machine check handler...

Am I missing something? It seems this patch is incomplete. If some of these
missed SEAMCALLs don't dirty a cacheline, then the justification that it works
by just covering all seamcalls needs to be updated.


Side topic. Do all the SEAMCALL wrappers calling into the seamcall_*() variety
of wrappers need the entropy retry logic? I think no, and some callers actually
depend on it not happening.


> +	u64 ret;
> +
> +	preempt_disable();
> +
> +	/*
> +	 * SEAMCALLs are made to the TDX module and can generate dirty
> +	 * cachelines of TDX private memory.  Mark cache state incoherent
> +	 * so that the cache can be flushed during kexec.
> +	 *
> +	 * This needs to be done before actually making the SEAMCALL,
> +	 * because kexec-ing CPU could send NMI to stop remote CPUs,
> +	 * in which case even disabling IRQ won't help here.
> +	 */
> +	this_cpu_write(cache_state_incoherent, true);
> +
> +	ret = func(fn, args);
> +
> +	preempt_enable();
> +
> +	return ret;
> +}
> +
>  static __always_inline u64 sc_retry(sc_func_t func, u64 fn,
>  			   struct tdx_module_args *args)
>  {
> @@ -113,7 +140,7 @@ static __always_inline u64 sc_retry(sc_func_t func, u64 fn,
>  	u64 ret;
>  
>  	do {
> -		ret = func(fn, args);
> +		ret = do_seamcall(func, fn, args);
>  	} while (ret == TDX_RND_NO_ENTROPY && --retry);
>  
>  	return ret;