lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250807094423.4644-1-yan.y.zhao@intel.com>
Date: Thu,  7 Aug 2025 17:44:23 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: pbonzini@...hat.com,
	seanjc@...gle.com
Cc: linux-kernel@...r.kernel.org,
	kvm@...r.kernel.org,
	x86@...nel.org,
	rick.p.edgecombe@...el.com,
	dave.hansen@...el.com,
	kas@...nel.org,
	tabba@...gle.com,
	ackerleytng@...gle.com,
	quic_eberman@...cinc.com,
	michael.roth@....com,
	david@...hat.com,
	vannapurve@...gle.com,
	vbabka@...e.cz,
	thomas.lendacky@....com,
	pgonda@...gle.com,
	zhiquan1.li@...el.com,
	fan.du@...el.com,
	jun.miao@...el.com,
	ira.weiny@...el.com,
	isaku.yamahata@...el.com,
	xiaoyao.li@...el.com,
	binbin.wu@...ux.intel.com,
	chao.p.peng@...el.com,
	yan.y.zhao@...el.com
Subject: [RFC PATCH v2 14/23] KVM: TDX: Split and inhibit huge mappings if a VMExit carries level info

TDX requires guests to accept S-EPT mappings created by the host KVM. Due
to the current implementation of the TDX module, if a guest accepts a GFN
at a lower level after KVM maps it at a higher level, the TDX module will
emulate an EPT violation VMExit to KVM instead of returning a size mismatch
error to the guest. If KVM fails to perform page splitting in the VMExit
handler, the guest's accept operation will be triggered again upon
re-entering the guest, causing a repeated EPT violation VMExit.

The TDX module thus enables the EPT violation VMExit to carry the guest's
accept level when the VMExit is caused by the guest's accept operation.

Therefore, in TDX's EPT violation handler
(1) Set the guest inhibit bit in the lpage info to prevent KVM MMU core
    from mapping at a higher a level than the guest's accept level.

(2) Split any existing huge mapping at the fault GFN to avoid unsupported
    splitting under the shared mmu_lock by TDX.

Use write mmu_lock to pretect (1) and (2) for now. If future KVM TDX can
perform the actual splitting under shared mmu_lock with enhanced TDX
modules, (1) is possible to be called under shared mmu_lock, and (2) would
become unnecessary.

As an optimization, this patch calls hugepage_test_guest_inhibit() without
holding the mmu_lock to reduce the frequency of acquiring the write
mmu_lock. The write mmu_lock is thus only acquired if the guest inhibit bit
is not already set. This is safe because the guest inhibit bit is set in a
one-way manner while the splitting under the write mmu_lock is performed
before setting the guest inhibit bit.

Link: https://lore.kernel.org/all/a6ffe23fb97e64109f512fa43e9f6405236ed40a.camel@intel.com
Suggested-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
Suggested-by: Sean Christopherson <seanjc@...gle.com>
Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
---
RFC v2
- Change tdx_get_accept_level() to tdx_check_accept_level().
- Invoke kvm_split_cross_boundary_leafs() and hugepage_set_guest_inhibit()
  to change KVM mapping level in a global way according to guest accept
  level. (Rick, Sean).

RFC v1:
- Introduce tdx_get_accept_level() to get guest accept level.
- Use tdx->violation_request_level and tdx->violation_gfn* to pass guest
  accept level to tdx_gmem_private_max_mapping_level() to detemine KVM
  mapping level.
---
 arch/x86/kvm/vmx/tdx.c      | 50 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx_arch.h |  3 +++
 2 files changed, 53 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 035d81275be4..71115058e5e6 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2019,6 +2019,53 @@ static inline bool tdx_is_sept_violation_unexpected_pending(struct kvm_vcpu *vcp
 	return !(eq & EPT_VIOLATION_PROT_MASK) && !(eq & EPT_VIOLATION_EXEC_FOR_RING3_LIN);
 }
 
+static inline int tdx_check_accept_level(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot = gfn_to_memslot(vcpu->kvm, gfn);
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	struct kvm *kvm = vcpu->kvm;
+	u64 eeq_type, eeq_info;
+	int level = -1;
+
+	if (!slot)
+		return 0;
+
+	eeq_type = tdx->ext_exit_qualification & TDX_EXT_EXIT_QUAL_TYPE_MASK;
+	if (eeq_type != TDX_EXT_EXIT_QUAL_TYPE_ACCEPT)
+		return 0;
+
+	eeq_info = (tdx->ext_exit_qualification & TDX_EXT_EXIT_QUAL_INFO_MASK) >>
+		   TDX_EXT_EXIT_QUAL_INFO_SHIFT;
+
+	level = (eeq_info & GENMASK(2, 0)) + 1;
+
+	if (level == PG_LEVEL_4K || level == PG_LEVEL_2M) {
+		if (!hugepage_test_guest_inhibit(slot, gfn, level + 1)) {
+			gfn_t base_gfn = gfn_round_for_level(gfn, level);
+			struct kvm_gfn_range gfn_range = {
+				.start = base_gfn,
+				.end = base_gfn + KVM_PAGES_PER_HPAGE(level),
+				.slot = slot,
+				.may_block = true,
+				.attr_filter = KVM_FILTER_PRIVATE,
+			};
+
+			scoped_guard(write_lock, &kvm->mmu_lock) {
+				int ret;
+
+				ret = kvm_split_cross_boundary_leafs(kvm, &gfn_range, false);
+				if (ret)
+					return ret;
+
+				hugepage_set_guest_inhibit(slot, gfn, level + 1);
+				if (level == PG_LEVEL_4K)
+					hugepage_set_guest_inhibit(slot, gfn, level + 2);
+			}
+		}
+	}
+	return 0;
+}
+
 static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qual;
@@ -2044,6 +2091,9 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 		 */
 		exit_qual = EPT_VIOLATION_ACC_WRITE;
 
+		if (tdx_check_accept_level(vcpu, gpa_to_gfn(gpa)))
+			return RET_PF_RETRY;
+
 		/* Only private GPA triggers zero-step mitigation */
 		local_retry = true;
 	} else {
diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
index a30e880849e3..af006a73ee05 100644
--- a/arch/x86/kvm/vmx/tdx_arch.h
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -82,7 +82,10 @@ struct tdx_cpuid_value {
 #define TDX_TD_ATTR_PERFMON		BIT_ULL(63)
 
 #define TDX_EXT_EXIT_QUAL_TYPE_MASK	GENMASK(3, 0)
+#define TDX_EXT_EXIT_QUAL_TYPE_ACCEPT  1
 #define TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION  6
+#define TDX_EXT_EXIT_QUAL_INFO_MASK	GENMASK(63, 32)
+#define TDX_EXT_EXIT_QUAL_INFO_SHIFT	32
 /*
  * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is 1024B.
  */
-- 
2.43.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ