[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fedb3192-e68c-423c-93b2-a4dc2f964148@intel.com>
Date: Fri, 30 Jan 2026 08:18:07 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Chao Gao <chao.gao@...el.com>
Cc: linux-coco@...ts.linux.dev, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, x86@...nel.org, reinette.chatre@...el.com,
ira.weiny@...el.com, kai.huang@...el.com, dan.j.williams@...el.com,
yilun.xu@...ux.intel.com, sagis@...gle.com, vannapurve@...gle.com,
paulmck@...nel.org, nik.borisov@...e.com, zhenzhong.duan@...el.com,
seanjc@...gle.com, rick.p.edgecombe@...el.com, kas@...nel.org,
dave.hansen@...ux.intel.com, vishal.l.verma@...el.com,
Farrah Chen <farrah.chen@...el.com>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v3 07/26] x86/virt/seamldr: Introduce a wrapper for
P-SEAMLDR SEAMCALLs
On 1/30/26 05:21, Chao Gao wrote:
...
>>> + /*
>>> + * SEAMRET from P-SEAMLDR invalidates the current VMCS. Save/restore
>>> + * the VMCS across P-SEAMLDR SEAMCALLs to avoid clobbering KVM state.
>>> + * Disable interrupts as KVM is allowed to do VMREAD/VMWRITE in IRQ
>>> + * context (but not NMI context).
>>> + */
>>
>> I think you mean:
>>
>> WARN_ON(in_nmi());
>
> This function only disables interrupts, not NMIs. Kirill questioned whether any
> KVM operations might execute from NMI context and do VMREAD/VMWRITE. If such
> operations exist and an NMI interrupts seamldr_call(), they could encounter
> an invalid current VMCS.
>
> The problematic scenario is:
>
> seamldr_call() KVM code in NMI handler
>
> 1. vmptrst // save current-vmcs
> 2. seamcall // clobber current-vmcs
> 3. // NMI handler start
> call into some KVM code and do vmread/vmwrite
> // consume __invalid__ current-vmcs
> // NMI handler end
> 4. vmptrld // restore current-vmcs
>
> The comment clarifies that KVM doesn't do VMREAD/VMWRITE during NMI handling.
How about something like:
P-SEAMLDR calls invalidate the current VMCS. It must be saved
and restored around the call. Exclude KVM access to the VMCS
by disabling interrupts. This is not safe against VMCS use in
NMIs, but there are none of those today.
Ideally, you'd also pair that with _some_ checks in the KVM code that
use lockdep or warnings to reiterate that NMI access to the VMCS is not OK.
>>> + local_irq_save(flags);
>>> +
>>> + asm goto("1: vmptrst %0\n\t"
>>> + _ASM_EXTABLE(1b, %l[error])
>>> + : "=m" (vmcs) : : "cc" : error);
>>
>> I'd much rather this be wrapped up in a helper function. We shouldn't
>> have to look at the horrors of inline assembly like this.
>>
>> But this *REALLY* wants the KVM folks to look at it. One argument is
>> that with the inline assembly this is nice and self-contained. The other
>> argument is that this completely ignores all existing KVM infrastructure
>> and is parallel VMCS management.
>
> Exactly. Sean suggested this approach [*]. He prefers inline assembly rather than
> adding new, inferior wrappers
>
> *: https://lore.kernel.org/linux-coco/aHEYtGgA3aIQ7A3y@google.com/
Get his explicit reviews on the patch, please.
Also, I 100% object to inline assembly in the main flow. Please at least
make a wrapper for these and stick them in:
arch/x86/include/asm/special_insns.h
so the inline assembly spew is hidden from view.
>> I'd be shocked if this is the one and only place in the whole kernel
>> that can unceremoniously zap VMX state.
>>
>> I'd *bet* that you don't really need to do the vmptrld and that KVM can
>> figure it out because it can vmptrld on demand anyway. Something along
>> the lines of:
>>
>> local_irq_disable();
>> list_for_each(handwaving...)
>> vmcs_clear();
>> ret = seamldr_prerr(fn, args);
>> local_irq_enable();
>>
>> Basically, zap this CPU's vmcs state and then make KVM reload it at some
>> later time.
>
> The idea is feasible. But just calling vmcs_clear() won't work. We need to
> reset all the tracking state associated with each VMCS. We should call
> vmclear_local_loaded_vmcss() instead, similar to what's done before VMXOFF.
>
>>
>> I'm sure Sean and Paolo will tell me if I'm crazy.
>
> To me, this approach needs more work since we need to either move
> vmclear_local_loaded_vmcss() to the kernel or allow KVM to register a callback.
>
> I don't think it's as straightforward as just doing the save/restore.
Could you please just do me a favor and spend 20 minutes to see what
this looks like in practice and if the KVM folks hate it?
>>> diff --git a/drivers/virt/coco/tdx-host/Kconfig b/drivers/virt/coco/tdx-host/Kconfig
>>> index e58bad148a35..6a9199e6c2c6 100644
>>> --- a/drivers/virt/coco/tdx-host/Kconfig
>>> +++ b/drivers/virt/coco/tdx-host/Kconfig
>>> @@ -8,3 +8,13 @@ config TDX_HOST_SERVICES
>>>
>>> Say y or m if enabling support for confidential virtual machine
>>> support (CONFIG_INTEL_TDX_HOST). The module is called tdx_host.ko
>>> +
>>> +config INTEL_TDX_MODULE_UPDATE
>>> + bool "Intel TDX module runtime update"
>>> + depends on TDX_HOST_SERVICES
>>> + help
>>> + This enables the kernel to support TDX module runtime update. This
>>> + allows the admin to update the TDX module to another compatible
>>> + version without the need to terminate running TDX guests.
>>
>> ... as opposed to the method that the kernel has to update the module
>> without terminating guests? ;)
>
> I will reduce this to:
>
> This enables the kernel to update the TDX Module to another compatible
> version.
I guess I'll be explicit: Remove this Kconfig prompt.
I think you should remove INTEL_TDX_MODULE_UPDATE entirely. But I'll
settle for:
config INTEL_TDX_MODULE_UPDATE
bool
default TDX_HOST_SERVICES
so that users don't have to see it. Don't bother users with it. Period.
Powered by blists - more mailing lists