[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250816144436.83718-2-adrian.hunter@intel.com>
Date: Sat, 16 Aug 2025 17:44:34 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: pbonzini@...hat.com,
seanjc@...gle.com
Cc: kvm@...r.kernel.org,
rick.p.edgecombe@...el.com,
kirill.shutemov@...ux.intel.com,
kai.huang@...el.com,
reinette.chatre@...el.com,
xiaoyao.li@...el.com,
tony.lindgren@...ux.intel.com,
binbin.wu@...ux.intel.com,
isaku.yamahata@...el.com,
linux-kernel@...r.kernel.org,
yan.y.zhao@...el.com,
chao.gao@...el.com,
ira.weiny@...el.com
Subject: [PATCH RFC 1/2] KVM: TDX: Disable general support for MWAIT in guest
TDX support for using the MWAIT instruction in a guest has issues, so
disable it for now.
Background
Like VMX, TDX can allow the MWAIT instruction to be executed in a guest.
Unlike VMX, TDX cannot necessarily provide for virtualization of MSRs that
a guest might reasonably expect to exist as well.
For example, in the case of a Linux guest, the default idle driver
intel_idle may access MSR_POWER_CTL or MSR_PKG_CST_CONFIG_CONTROL. To
virtualize those, KVM would need the guest not to enable #VE reduction,
which is not something that KVM can control or even be aware of. Note,
however, that the consequent unchecked MSR access errors might be harmless.
Without #VE reduction enabled, the TDX Module will inject #VE for MSRs that
it does not virtualize itself. The guest can then hypercall the host VMM
for a resolution.
With #VE reduction enabled, accessing MSRs such as the 2 above, results in
the TDX Module injecting #GP.
Currently, Linux guest opts for #VE reduction unconditionally if it is
available, refer reduce_unnecessary_ve(). However, the #VE reduction
feature was not added to the TDX Module until versions 1.5.09 and 2.0.04.
Refer https://github.com/intel/tdx-module/releases
There is also a further issue experienced by a Linux guest. Prior to
TDX Module versions 1.5.09 and 2.0.04, the Always-Running-APIC-Timer (ARAT)
feature (CPUID leaf 6: EAX bit 2) is not exposed. That results in cpuidle
disabling the timer interrupt and invoking the Tick Broadcast framework
to provide a wake-up. Currently, that falls back to the PIT timer which
does not work for TDX, resulting in the guest becoming stuck in the idle
loop.
Conclusion
User's may expect TDX support of MWAIT in a guest to be similar to VMX
support, but KVM cannot ensure that. Consequently KVM should not expose
the capability.
Fixes: 0186dd29a2518 ("KVM: TDX: add ioctl to initialize VM with TDX specific parameters")
Signed-off-by: Adrian Hunter <adrian.hunter@...el.com>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/vmx/tdx.c | 22 +++++++++++++++++++++-
arch/x86/kvm/x86.c | 8 +++++---
3 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f7af967aa16f..9c8617217adb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1398,6 +1398,8 @@ struct kvm_arch {
gpa_t wall_clock;
+ u64 unsupported_disable_exits;
+
bool mwait_in_guest;
bool hlt_in_guest;
bool pause_in_guest;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9ad460ef97b0..cdf0dc6cf068 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -132,6 +132,17 @@ static void clear_waitpkg(struct kvm_cpuid_entry2 *entry)
entry->ecx &= ~__feature_bit(X86_FEATURE_WAITPKG);
}
+static bool has_mwait(const struct kvm_cpuid_entry2 *entry)
+{
+ return entry->function == 1 &&
+ (entry->ecx & __feature_bit(X86_FEATURE_MWAIT));
+}
+
+static void clear_mwait(struct kvm_cpuid_entry2 *entry)
+{
+ entry->ecx &= ~__feature_bit(X86_FEATURE_MWAIT);
+}
+
static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry)
{
if (has_tsx(entry))
@@ -139,11 +150,15 @@ static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry)
if (has_waitpkg(entry))
clear_waitpkg(entry);
+
+ /* Also KVM_X86_DISABLE_EXITS_MWAIT is disallowed in tdx_vm_init() */
+ if (has_mwait(entry))
+ clear_mwait(entry);
}
static bool tdx_unsupported_cpuid(const struct kvm_cpuid_entry2 *entry)
{
- return has_tsx(entry) || has_waitpkg(entry);
+ return has_tsx(entry) || has_waitpkg(entry) || has_mwait(entry);
}
#define KVM_TDX_CPUID_NO_SUBLEAF ((__u32)-1)
@@ -615,6 +630,11 @@ int tdx_vm_init(struct kvm *kvm)
kvm->arch.has_protected_state = true;
kvm->arch.has_private_mem = true;
kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT;
+ /*
+ * TDX support for using the MWAIT instruction in a guest has issues,
+ * so disable it for now. See also tdx_clear_unsupported_cpuid().
+ */
+ kvm->arch.unsupported_disable_exits |= KVM_X86_DISABLE_EXITS_MWAIT;
/*
* Because guest TD is protected, VMM can't parse the instruction in TD.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 93636f77c42d..bfd4f52286b8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4575,7 +4575,7 @@ static inline bool kvm_can_mwait_in_guest(void)
boot_cpu_has(X86_FEATURE_ARAT);
}
-static u64 kvm_get_allowed_disable_exits(void)
+static u64 kvm_get_allowed_disable_exits(struct kvm *kvm)
{
u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
@@ -4586,6 +4586,8 @@ static u64 kvm_get_allowed_disable_exits(void)
if (kvm_can_mwait_in_guest())
r |= KVM_X86_DISABLE_EXITS_MWAIT;
}
+ if (kvm)
+ r &= ~kvm->arch.unsupported_disable_exits;
return r;
}
@@ -4736,7 +4738,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = KVM_CLOCK_VALID_FLAGS;
break;
case KVM_CAP_X86_DISABLE_EXITS:
- r = kvm_get_allowed_disable_exits();
+ r = kvm_get_allowed_disable_exits(kvm);
break;
case KVM_CAP_X86_SMM:
if (!IS_ENABLED(CONFIG_KVM_SMM))
@@ -6613,7 +6615,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
break;
case KVM_CAP_X86_DISABLE_EXITS:
r = -EINVAL;
- if (cap->args[0] & ~kvm_get_allowed_disable_exits())
+ if (cap->args[0] & ~kvm_get_allowed_disable_exits(kvm))
break;
mutex_lock(&kvm->lock);
--
2.48.1
Powered by blists - more mailing lists