[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260129011517.3545883-32-seanjc@google.com>
Date: Wed, 28 Jan 2026 17:15:03 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
Kiryl Shutsemau <kas@...nel.org>, Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-coco@...ts.linux.dev,
kvm@...r.kernel.org, Kai Huang <kai.huang@...el.com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>, Yan Zhao <yan.y.zhao@...el.com>,
Vishal Annapurve <vannapurve@...gle.com>, Ackerley Tng <ackerleytng@...gle.com>,
Sagi Shahar <sagis@...gle.com>, Binbin Wu <binbin.wu@...ux.intel.com>,
Xiaoyao Li <xiaoyao.li@...el.com>, Isaku Yamahata <isaku.yamahata@...el.com>
Subject: [RFC PATCH v5 31/45] KVM: x86/mmu: Prevent hugepage promotion for
mirror roots in fault path
From: Rick Edgecombe <rick.p.edgecombe@...el.com>
Disallow hugepage promotion in the TDP MMU for mirror roots as KVM doesn't
currently support promoting S-EPT entries due to the complexity incurred
by the TDX-Module's rules for hugepage promotion.
- The current TDX-Module requires all 4KB leafs to be either all PENDING
or all ACCEPTED before a successful promotion to 2MB. This requirement
prevents successful page merging after partially converting a 2MB
range from private to shared and then back to private, which is the
primary scenario necessitating page promotion.
- The TDX-Module effectively requires a break-before-make sequence (to
satisfy its TLB flushing rules), i.e. creates a window of time where a
different vCPU can encounter faults on a SPTE that KVM is trying to
promote to a hugepage. To avoid unexpected BUSY errors, KVM would need
to FREEZE the non-leaf SPTE before replacing it with a huge SPTE.
Disable hugepage promotion for all map() operations, as supporting page
promotion when building the initial image is still non-trivial, and the
vast majority of images are ~4MB or less, i.e. the benefit of creating
hugepages during TD build time is minimal.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
Co-developed-by: Yan Zhao <yan.y.zhao@...el.com>
Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
[sean: check root, add comment, rewrite changelog]
Signed-off-by: Sean Christopherson <seanjc@...gle.com>
---
arch/x86/kvm/mmu/mmu.c | 3 ++-
arch/x86/kvm/mmu/tdp_mmu.c | 12 +++++++++++-
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4ecbf216d96f..45650f70eeab 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3419,7 +3419,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
cur_level == fault->goal_level &&
is_shadow_present_pte(spte) &&
!is_large_pte(spte) &&
- spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+ ((spte_to_child_sp(spte)->nx_huge_page_disallowed) ||
+ is_mirror_sp(spte_to_child_sp(spte)))) {
/*
* A small SPTE exists for this pfn, but FNAME(fetch),
* direct_map(), or kvm_tdp_mmu_map() would like to create a
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 01e3e4f4baa5..f8ebdd0c6114 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1222,7 +1222,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
int r;
- if (fault->nx_huge_page_workaround_enabled)
+ /*
+ * Don't replace a page table (non-leaf) SPTE with a huge SPTE
+ * (a.k.a. hugepage promotion) if the NX hugepage workaround is
+ * enabled, as doing so will cause significant thrashing if one
+ * or more leaf SPTEs needs to be executable.
+ *
+ * Disallow hugepage promotion for mirror roots as KVM doesn't
+ * (yet) support promoting S-EPT entries while holding mmu_lock
+ * for read (due to complexity induced by the TDX-Module APIs).
+ */
+ if (fault->nx_huge_page_workaround_enabled || is_mirror_sp(root))
disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
/*
--
2.53.0.rc1.217.geba53bf80e-goog
Powered by blists - more mailing lists