lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250816144436.83718-2-adrian.hunter@intel.com>
Date: Sat, 16 Aug 2025 17:44:34 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: pbonzini@...hat.com,
	seanjc@...gle.com
Cc: kvm@...r.kernel.org,
	rick.p.edgecombe@...el.com,
	kirill.shutemov@...ux.intel.com,
	kai.huang@...el.com,
	reinette.chatre@...el.com,
	xiaoyao.li@...el.com,
	tony.lindgren@...ux.intel.com,
	binbin.wu@...ux.intel.com,
	isaku.yamahata@...el.com,
	linux-kernel@...r.kernel.org,
	yan.y.zhao@...el.com,
	chao.gao@...el.com,
	ira.weiny@...el.com
Subject: [PATCH RFC 1/2] KVM: TDX: Disable general support for MWAIT in guest

TDX support for using the MWAIT instruction in a guest has issues, so
disable it for now.

Background

Like VMX, TDX can allow the MWAIT instruction to be executed in a guest.
Unlike VMX, TDX cannot necessarily provide for virtualization of MSRs that
a guest might reasonably expect to exist as well.

For example, in the case of a Linux guest, the default idle driver
intel_idle may access MSR_POWER_CTL or MSR_PKG_CST_CONFIG_CONTROL.  To
virtualize those, KVM would need the guest not to enable #VE reduction,
which is not something that KVM can control or even be aware of.  Note,
however, that the consequent unchecked MSR access errors might be harmless.

Without #VE reduction enabled, the TDX Module will inject #VE for MSRs that
it does not virtualize itself.  The guest can then hypercall the host VMM
for a resolution.

With #VE reduction enabled, accessing MSRs such as the 2 above, results in
the TDX Module injecting #GP.

Currently, Linux guest opts for #VE reduction unconditionally if it is
available, refer reduce_unnecessary_ve().  However, the #VE reduction
feature was not added to the TDX Module until versions 1.5.09 and 2.0.04.
Refer https://github.com/intel/tdx-module/releases

There is also a further issue experienced by a Linux guest.  Prior to
TDX Module versions 1.5.09 and 2.0.04, the Always-Running-APIC-Timer (ARAT)
feature (CPUID leaf 6: EAX bit 2) is not exposed.  That results in cpuidle
disabling the timer interrupt and invoking the Tick Broadcast framework
to provide a wake-up.  Currently, that falls back to the PIT timer which
does not work for TDX, resulting in the guest becoming stuck in the idle
loop.

Conclusion

User's may expect TDX support of MWAIT in a guest to be similar to VMX
support, but KVM cannot ensure that.  Consequently KVM should not expose
the capability.

Fixes: 0186dd29a2518 ("KVM: TDX: add ioctl to initialize VM with TDX specific parameters")
Signed-off-by: Adrian Hunter <adrian.hunter@...el.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/vmx/tdx.c          | 22 +++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  8 +++++---
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f7af967aa16f..9c8617217adb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1398,6 +1398,8 @@ struct kvm_arch {
 
 	gpa_t wall_clock;
 
+	u64 unsupported_disable_exits;
+
 	bool mwait_in_guest;
 	bool hlt_in_guest;
 	bool pause_in_guest;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9ad460ef97b0..cdf0dc6cf068 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -132,6 +132,17 @@ static void clear_waitpkg(struct kvm_cpuid_entry2 *entry)
 	entry->ecx &= ~__feature_bit(X86_FEATURE_WAITPKG);
 }
 
+static bool has_mwait(const struct kvm_cpuid_entry2 *entry)
+{
+	return entry->function == 1 &&
+	       (entry->ecx & __feature_bit(X86_FEATURE_MWAIT));
+}
+
+static void clear_mwait(struct kvm_cpuid_entry2 *entry)
+{
+	entry->ecx &= ~__feature_bit(X86_FEATURE_MWAIT);
+}
+
 static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry)
 {
 	if (has_tsx(entry))
@@ -139,11 +150,15 @@ static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry)
 
 	if (has_waitpkg(entry))
 		clear_waitpkg(entry);
+
+	/* Also KVM_X86_DISABLE_EXITS_MWAIT is disallowed in tdx_vm_init() */
+	if (has_mwait(entry))
+		clear_mwait(entry);
 }
 
 static bool tdx_unsupported_cpuid(const struct kvm_cpuid_entry2 *entry)
 {
-	return has_tsx(entry) || has_waitpkg(entry);
+	return has_tsx(entry) || has_waitpkg(entry) || has_mwait(entry);
 }
 
 #define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
@@ -615,6 +630,11 @@ int tdx_vm_init(struct kvm *kvm)
 	kvm->arch.has_protected_state = true;
 	kvm->arch.has_private_mem = true;
 	kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT;
+	/*
+	 * TDX support for using the MWAIT instruction in a guest has issues,
+	 * so disable it for now. See also tdx_clear_unsupported_cpuid().
+	 */
+	kvm->arch.unsupported_disable_exits |= KVM_X86_DISABLE_EXITS_MWAIT;
 
 	/*
 	 * Because guest TD is protected, VMM can't parse the instruction in TD.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 93636f77c42d..bfd4f52286b8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4575,7 +4575,7 @@ static inline bool kvm_can_mwait_in_guest(void)
 		boot_cpu_has(X86_FEATURE_ARAT);
 }
 
-static u64 kvm_get_allowed_disable_exits(void)
+static u64 kvm_get_allowed_disable_exits(struct kvm *kvm)
 {
 	u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
 
@@ -4586,6 +4586,8 @@ static u64 kvm_get_allowed_disable_exits(void)
 		if (kvm_can_mwait_in_guest())
 			r |= KVM_X86_DISABLE_EXITS_MWAIT;
 	}
+	if (kvm)
+		r &= ~kvm->arch.unsupported_disable_exits;
 	return r;
 }
 
@@ -4736,7 +4738,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = KVM_CLOCK_VALID_FLAGS;
 		break;
 	case KVM_CAP_X86_DISABLE_EXITS:
-		r = kvm_get_allowed_disable_exits();
+		r = kvm_get_allowed_disable_exits(kvm);
 		break;
 	case KVM_CAP_X86_SMM:
 		if (!IS_ENABLED(CONFIG_KVM_SMM))
@@ -6613,7 +6615,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	case KVM_CAP_X86_DISABLE_EXITS:
 		r = -EINVAL;
-		if (cap->args[0] & ~kvm_get_allowed_disable_exits())
+		if (cap->args[0] & ~kvm_get_allowed_disable_exits(kvm))
 			break;
 
 		mutex_lock(&kvm->lock);
-- 
2.48.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ