linux-kernel - Re: [PATCH v3 07/26] x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXywVcqbXodADg4a@intel.com>
Date: Fri, 30 Jan 2026 21:21:25 +0800
From: Chao Gao <chao.gao@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: <linux-coco@...ts.linux.dev>, <linux-kernel@...r.kernel.org>,
	<kvm@...r.kernel.org>, <x86@...nel.org>, <reinette.chatre@...el.com>,
	<ira.weiny@...el.com>, <kai.huang@...el.com>, <dan.j.williams@...el.com>,
	<yilun.xu@...ux.intel.com>, <sagis@...gle.com>, <vannapurve@...gle.com>,
	<paulmck@...nel.org>, <nik.borisov@...e.com>, <zhenzhong.duan@...el.com>,
	<seanjc@...gle.com>, <rick.p.edgecombe@...el.com>, <kas@...nel.org>,
	<dave.hansen@...ux.intel.com>, <vishal.l.verma@...el.com>, Farrah Chen
	<farrah.chen@...el.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar
	<mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, "H. Peter Anvin"
	<hpa@...or.com>
Subject: Re: [PATCH v3 07/26] x86/virt/seamldr: Introduce a wrapper for
 P-SEAMLDR SEAMCALLs

On Wed, Jan 28, 2026 at 03:36:49PM -0800, Dave Hansen wrote:
>On 1/23/26 06:55, Chao Gao wrote:
>...
>> +static __maybe_unused int seamldr_call(u64 fn, struct tdx_module_args *args)
>> +{
>> +	unsigned long flags;
>> +	u64 vmcs;
>> +	int ret;
>> +
>> +	if (!is_seamldr_call(fn))
>> +		return -EINVAL;
>
>Why is this here? We shouldn't be silently papering over kernel bugs.
>This is a WARN_ON() at *best*, but it also begs the question of how a
>non-SEAMLDR call even got here.

Only SEAMLDR calls can get here. I will make it a WARN_ON_ONCE().

>
>> +	/*
>> +	 * SEAMRET from P-SEAMLDR invalidates the current VMCS.  Save/restore
>> +	 * the VMCS across P-SEAMLDR SEAMCALLs to avoid clobbering KVM state.
>> +	 * Disable interrupts as KVM is allowed to do VMREAD/VMWRITE in IRQ
>> +	 * context (but not NMI context).
>> +	 */
>
>I think you mean:
>
>	WARN_ON(in_nmi());

This function only disables interrupts, not NMIs. Kirill questioned whether any
KVM operations might execute from NMI context and do VMREAD/VMWRITE. If such
operations exist and an NMI interrupts seamldr_call(), they could encounter
an invalid current VMCS.

The problematic scenario is:

	seamldr_call()			KVM code in NMI handler

1.	vmptrst // save current-vmcs
2.	seamcall // clobber current-vmcs
3.					// NMI handler start
					call into some KVM code and do vmread/vmwrite
					// consume __invalid__ current-vmcs
					// NMI handler end
4.	vmptrld // restore current-vmcs

The comment clarifies that KVM doesn't do VMREAD/VMWRITE during NMI handling.

>
>> +	local_irq_save(flags);
>> +
>> +	asm goto("1: vmptrst %0\n\t"
>> +		 _ASM_EXTABLE(1b, %l[error])
>> +		 : "=m" (vmcs) : : "cc" : error);
>
>I'd much rather this be wrapped up in a helper function. We shouldn't
>have to look at the horrors of inline assembly like this.
>
>But this *REALLY* wants the KVM folks to look at it. One argument is
>that with the inline assembly this is nice and self-contained. The other
>argument is that this completely ignores all existing KVM infrastructure
>and is parallel VMCS management.

Exactly. Sean suggested this approach [*]. He prefers inline assembly rather than
adding new, inferior wrappers

*: https://lore.kernel.org/linux-coco/aHEYtGgA3aIQ7A3y@google.com/

>
>I'd be shocked if this is the one and only place in the whole kernel
>that can unceremoniously zap VMX state.
>
>I'd *bet* that you don't really need to do the vmptrld and that KVM can
>figure it out because it can vmptrld on demand anyway. Something along
>the lines of:
>
>	local_irq_disable();
>	list_for_each(handwaving...)
>		vmcs_clear();
>	ret = seamldr_prerr(fn, args);
>	local_irq_enable();	
>
>Basically, zap this CPU's vmcs state and then make KVM reload it at some
>later time.

The idea is feasible. But just calling vmcs_clear() won't work. We need to
reset all the tracking state associated with each VMCS. We should call
vmclear_local_loaded_vmcss() instead, similar to what's done before VMXOFF.

>
>I'm sure Sean and Paolo will tell me if I'm crazy.

To me, this approach needs more work since we need to either move 
vmclear_local_loaded_vmcss() to the kernel or allow KVM to register a callback.

I don't think it's as straightforward as just doing the save/restore.

>
>> diff --git a/drivers/virt/coco/tdx-host/Kconfig b/drivers/virt/coco/tdx-host/Kconfig
>> index e58bad148a35..6a9199e6c2c6 100644
>> --- a/drivers/virt/coco/tdx-host/Kconfig
>> +++ b/drivers/virt/coco/tdx-host/Kconfig
>> @@ -8,3 +8,13 @@ config TDX_HOST_SERVICES
>>  
>>  	  Say y or m if enabling support for confidential virtual machine
>>  	  support (CONFIG_INTEL_TDX_HOST). The module is called tdx_host.ko
>> +
>> +config INTEL_TDX_MODULE_UPDATE
>> +	bool "Intel TDX module runtime update"
>> +	depends on TDX_HOST_SERVICES
>> +	help
>> +	  This enables the kernel to support TDX module runtime update. This
>> +	  allows the admin to update the TDX module to another compatible
>> +	  version without the need to terminate running TDX guests.
>
>... as opposed to the method that the kernel has to update the module
>without terminating guests? ;)

I will reduce this to:

	  This enables the kernel to update the TDX Module to another compatible
	  version.


>
>> +	  If unsure, say N.
>
>Let's call this:
>
> config
>INTEL_TDX_ONLY_DISABLE_THIS_IF_YOU_HATE_SECURITY_AND_IF_YOU_DO_WHY_ARE_YOU_RUNNING_TDX?
>
>Can we have question marks in config symbol names? ;)
>
>But, seriously, what the heck? Who would disable security updates for
>their confidential computing infrastructure? Is this some kind of
>intelligence test for our users so that if someone disables it we can
>just laugh at them?

Looks like I failed that test! ;) I'll change it to default to 'y' and
recommend 'Y' if unsure.