[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXywVcqbXodADg4a@intel.com>
Date: Fri, 30 Jan 2026 21:21:25 +0800
From: Chao Gao <chao.gao@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: <linux-coco@...ts.linux.dev>, <linux-kernel@...r.kernel.org>,
<kvm@...r.kernel.org>, <x86@...nel.org>, <reinette.chatre@...el.com>,
<ira.weiny@...el.com>, <kai.huang@...el.com>, <dan.j.williams@...el.com>,
<yilun.xu@...ux.intel.com>, <sagis@...gle.com>, <vannapurve@...gle.com>,
<paulmck@...nel.org>, <nik.borisov@...e.com>, <zhenzhong.duan@...el.com>,
<seanjc@...gle.com>, <rick.p.edgecombe@...el.com>, <kas@...nel.org>,
<dave.hansen@...ux.intel.com>, <vishal.l.verma@...el.com>, Farrah Chen
<farrah.chen@...el.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar
<mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, "H. Peter Anvin"
<hpa@...or.com>
Subject: Re: [PATCH v3 07/26] x86/virt/seamldr: Introduce a wrapper for
P-SEAMLDR SEAMCALLs
On Wed, Jan 28, 2026 at 03:36:49PM -0800, Dave Hansen wrote:
>On 1/23/26 06:55, Chao Gao wrote:
>...
>> +static __maybe_unused int seamldr_call(u64 fn, struct tdx_module_args *args)
>> +{
>> + unsigned long flags;
>> + u64 vmcs;
>> + int ret;
>> +
>> + if (!is_seamldr_call(fn))
>> + return -EINVAL;
>
>Why is this here? We shouldn't be silently papering over kernel bugs.
>This is a WARN_ON() at *best*, but it also begs the question of how a
>non-SEAMLDR call even got here.
Only SEAMLDR calls can get here. I will make it a WARN_ON_ONCE().
>
>> + /*
>> + * SEAMRET from P-SEAMLDR invalidates the current VMCS. Save/restore
>> + * the VMCS across P-SEAMLDR SEAMCALLs to avoid clobbering KVM state.
>> + * Disable interrupts as KVM is allowed to do VMREAD/VMWRITE in IRQ
>> + * context (but not NMI context).
>> + */
>
>I think you mean:
>
> WARN_ON(in_nmi());
This function only disables interrupts, not NMIs. Kirill questioned whether any
KVM operations might execute from NMI context and do VMREAD/VMWRITE. If such
operations exist and an NMI interrupts seamldr_call(), they could encounter
an invalid current VMCS.
The problematic scenario is:
seamldr_call() KVM code in NMI handler
1. vmptrst // save current-vmcs
2. seamcall // clobber current-vmcs
3. // NMI handler start
call into some KVM code and do vmread/vmwrite
// consume __invalid__ current-vmcs
// NMI handler end
4. vmptrld // restore current-vmcs
The comment clarifies that KVM doesn't do VMREAD/VMWRITE during NMI handling.
>
>> + local_irq_save(flags);
>> +
>> + asm goto("1: vmptrst %0\n\t"
>> + _ASM_EXTABLE(1b, %l[error])
>> + : "=m" (vmcs) : : "cc" : error);
>
>I'd much rather this be wrapped up in a helper function. We shouldn't
>have to look at the horrors of inline assembly like this.
>
>But this *REALLY* wants the KVM folks to look at it. One argument is
>that with the inline assembly this is nice and self-contained. The other
>argument is that this completely ignores all existing KVM infrastructure
>and is parallel VMCS management.
Exactly. Sean suggested this approach [*]. He prefers inline assembly rather than
adding new, inferior wrappers
*: https://lore.kernel.org/linux-coco/aHEYtGgA3aIQ7A3y@google.com/
>
>I'd be shocked if this is the one and only place in the whole kernel
>that can unceremoniously zap VMX state.
>
>I'd *bet* that you don't really need to do the vmptrld and that KVM can
>figure it out because it can vmptrld on demand anyway. Something along
>the lines of:
>
> local_irq_disable();
> list_for_each(handwaving...)
> vmcs_clear();
> ret = seamldr_prerr(fn, args);
> local_irq_enable();
>
>Basically, zap this CPU's vmcs state and then make KVM reload it at some
>later time.
The idea is feasible. But just calling vmcs_clear() won't work. We need to
reset all the tracking state associated with each VMCS. We should call
vmclear_local_loaded_vmcss() instead, similar to what's done before VMXOFF.
>
>I'm sure Sean and Paolo will tell me if I'm crazy.
To me, this approach needs more work since we need to either move
vmclear_local_loaded_vmcss() to the kernel or allow KVM to register a callback.
I don't think it's as straightforward as just doing the save/restore.
>
>> diff --git a/drivers/virt/coco/tdx-host/Kconfig b/drivers/virt/coco/tdx-host/Kconfig
>> index e58bad148a35..6a9199e6c2c6 100644
>> --- a/drivers/virt/coco/tdx-host/Kconfig
>> +++ b/drivers/virt/coco/tdx-host/Kconfig
>> @@ -8,3 +8,13 @@ config TDX_HOST_SERVICES
>>
>> Say y or m if enabling support for confidential virtual machine
>> support (CONFIG_INTEL_TDX_HOST). The module is called tdx_host.ko
>> +
>> +config INTEL_TDX_MODULE_UPDATE
>> + bool "Intel TDX module runtime update"
>> + depends on TDX_HOST_SERVICES
>> + help
>> + This enables the kernel to support TDX module runtime update. This
>> + allows the admin to update the TDX module to another compatible
>> + version without the need to terminate running TDX guests.
>
>... as opposed to the method that the kernel has to update the module
>without terminating guests? ;)
I will reduce this to:
This enables the kernel to update the TDX Module to another compatible
version.
>
>> + If unsure, say N.
>
>Let's call this:
>
> config
>INTEL_TDX_ONLY_DISABLE_THIS_IF_YOU_HATE_SECURITY_AND_IF_YOU_DO_WHY_ARE_YOU_RUNNING_TDX?
>
>Can we have question marks in config symbol names? ;)
>
>But, seriously, what the heck? Who would disable security updates for
>their confidential computing infrastructure? Is this some kind of
>intelligence test for our users so that if someone disables it we can
>just laugh at them?
Looks like I failed that test! ;) I'll change it to default to 'y' and
recommend 'Y' if unsure.
Powered by blists - more mailing lists