linux-kernel - Re: [PATCH v3 07/26] x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aYHmUCLRYL+JX1ga@intel.com>
Date: Tue, 3 Feb 2026 20:15:57 +0800
From: Chao Gao <chao.gao@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: <linux-coco@...ts.linux.dev>, <linux-kernel@...r.kernel.org>,
	<kvm@...r.kernel.org>, <x86@...nel.org>, <reinette.chatre@...el.com>,
	<ira.weiny@...el.com>, <kai.huang@...el.com>, <dan.j.williams@...el.com>,
	<yilun.xu@...ux.intel.com>, <sagis@...gle.com>, <vannapurve@...gle.com>,
	<paulmck@...nel.org>, <nik.borisov@...e.com>, <zhenzhong.duan@...el.com>,
	<seanjc@...gle.com>, <rick.p.edgecombe@...el.com>, <kas@...nel.org>,
	<dave.hansen@...ux.intel.com>, <vishal.l.verma@...el.com>, Farrah Chen
	<farrah.chen@...el.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar
	<mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, "H. Peter Anvin"
	<hpa@...or.com>
Subject: Re: [PATCH v3 07/26] x86/virt/seamldr: Introduce a wrapper for
 P-SEAMLDR SEAMCALLs

>>> I'd be shocked if this is the one and only place in the whole kernel
>>> that can unceremoniously zap VMX state.
>>>
>>> I'd *bet* that you don't really need to do the vmptrld and that KVM can
>>> figure it out because it can vmptrld on demand anyway. Something along
>>> the lines of:
>>>
>>> 	local_irq_disable();
>>> 	list_for_each(handwaving...)
>>> 		vmcs_clear();
>>> 	ret = seamldr_prerr(fn, args);
>>> 	local_irq_enable();	
>>>
>>> Basically, zap this CPU's vmcs state and then make KVM reload it at some
>>> later time.
>> 
>> The idea is feasible. But just calling vmcs_clear() won't work. We need to
>> reset all the tracking state associated with each VMCS. We should call
>> vmclear_local_loaded_vmcss() instead, similar to what's done before VMXOFF.
>> 
>>>
>>> I'm sure Sean and Paolo will tell me if I'm crazy.
>> 
>> To me, this approach needs more work since we need to either move 
>> vmclear_local_loaded_vmcss() to the kernel or allow KVM to register a callback.
>> 
>> I don't think it's as straightforward as just doing the save/restore.
>
>Could you please just do me a favor and spend 20 minutes to see what
>this looks like in practice and if the KVM folks hate it?

Sure. KVM tracks the current VMCS and only executes vmptrld for a new VMCS if
it differs from the current one. See arch/x86/kvm/vmx/vmx.c::vmx_vcpu_load_vmcs()

	prev = per_cpu(current_vmcs, cpu);
	if (prev != vmx->loaded_vmcs->vmcs) {
		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
		vmcs_load(vmx->loaded_vmcs->vmcs);
	}

By resetting current_vmcs to NULL during P-SEAMLDR calls, KVM is forced to do a
vmptrld on the next VMCS load. So, we can implement seamldr_call() as:

static int seamldr_call(u64 fn, struct tdx_module_args *args)
{
	int ret;

	WARN_ON_ONCE(!is_seamldr_call(fn));

	/*
	 * Serialize P-SEAMLDR calls since only a single CPU is allowed to
	 * interact with P-SEAMLDR at a time.
	 *
	 * P-SEAMLDR calls invalidate the current VMCS. Exclude KVM access to
	 * the VMCS by disabling interrupts. This is not safe against VMCS use
	 * in NMIs, but there are none of those today.
	 *
	 * Set the per-CPU current_vmcs cache to NULL to force KVM to reload
	 * the VMCS.
	 */
	guard(raw_spinlock_irqsave)(&seamldr_lock);
	ret = seamcall_prerr(fn, args);
	this_cpu_write(current_vmcs, NULL);

	return ret;
}

This requires moving the per-CPU current_vmcs from KVM to the kernel, which
should be trivial with Sean's VMXON series.

And I tested this. Without this_cpu_write(), vmread/vmwrite errors occur after
TDX Module updates. But with it, no errors.