[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYHmUCLRYL+JX1ga@intel.com>
Date: Tue, 3 Feb 2026 20:15:57 +0800
From: Chao Gao <chao.gao@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: <linux-coco@...ts.linux.dev>, <linux-kernel@...r.kernel.org>,
<kvm@...r.kernel.org>, <x86@...nel.org>, <reinette.chatre@...el.com>,
<ira.weiny@...el.com>, <kai.huang@...el.com>, <dan.j.williams@...el.com>,
<yilun.xu@...ux.intel.com>, <sagis@...gle.com>, <vannapurve@...gle.com>,
<paulmck@...nel.org>, <nik.borisov@...e.com>, <zhenzhong.duan@...el.com>,
<seanjc@...gle.com>, <rick.p.edgecombe@...el.com>, <kas@...nel.org>,
<dave.hansen@...ux.intel.com>, <vishal.l.verma@...el.com>, Farrah Chen
<farrah.chen@...el.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar
<mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, "H. Peter Anvin"
<hpa@...or.com>
Subject: Re: [PATCH v3 07/26] x86/virt/seamldr: Introduce a wrapper for
P-SEAMLDR SEAMCALLs
>>> I'd be shocked if this is the one and only place in the whole kernel
>>> that can unceremoniously zap VMX state.
>>>
>>> I'd *bet* that you don't really need to do the vmptrld and that KVM can
>>> figure it out because it can vmptrld on demand anyway. Something along
>>> the lines of:
>>>
>>> local_irq_disable();
>>> list_for_each(handwaving...)
>>> vmcs_clear();
>>> ret = seamldr_prerr(fn, args);
>>> local_irq_enable();
>>>
>>> Basically, zap this CPU's vmcs state and then make KVM reload it at some
>>> later time.
>>
>> The idea is feasible. But just calling vmcs_clear() won't work. We need to
>> reset all the tracking state associated with each VMCS. We should call
>> vmclear_local_loaded_vmcss() instead, similar to what's done before VMXOFF.
>>
>>>
>>> I'm sure Sean and Paolo will tell me if I'm crazy.
>>
>> To me, this approach needs more work since we need to either move
>> vmclear_local_loaded_vmcss() to the kernel or allow KVM to register a callback.
>>
>> I don't think it's as straightforward as just doing the save/restore.
>
>Could you please just do me a favor and spend 20 minutes to see what
>this looks like in practice and if the KVM folks hate it?
Sure. KVM tracks the current VMCS and only executes vmptrld for a new VMCS if
it differs from the current one. See arch/x86/kvm/vmx/vmx.c::vmx_vcpu_load_vmcs()
prev = per_cpu(current_vmcs, cpu);
if (prev != vmx->loaded_vmcs->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
vmcs_load(vmx->loaded_vmcs->vmcs);
}
By resetting current_vmcs to NULL during P-SEAMLDR calls, KVM is forced to do a
vmptrld on the next VMCS load. So, we can implement seamldr_call() as:
static int seamldr_call(u64 fn, struct tdx_module_args *args)
{
int ret;
WARN_ON_ONCE(!is_seamldr_call(fn));
/*
* Serialize P-SEAMLDR calls since only a single CPU is allowed to
* interact with P-SEAMLDR at a time.
*
* P-SEAMLDR calls invalidate the current VMCS. Exclude KVM access to
* the VMCS by disabling interrupts. This is not safe against VMCS use
* in NMIs, but there are none of those today.
*
* Set the per-CPU current_vmcs cache to NULL to force KVM to reload
* the VMCS.
*/
guard(raw_spinlock_irqsave)(&seamldr_lock);
ret = seamcall_prerr(fn, args);
this_cpu_write(current_vmcs, NULL);
return ret;
}
This requires moving the per-CPU current_vmcs from KVM to the kernel, which
should be trivial with Sean's VMXON series.
And I tested this. Without this_cpu_write(), vmread/vmwrite errors occur after
TDX Module updates. But with it, no errors.
Powered by blists - more mailing lists