[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260129011517.3545883-40-seanjc@google.com>
Date: Wed, 28 Jan 2026 17:15:11 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
Kiryl Shutsemau <kas@...nel.org>, Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-coco@...ts.linux.dev,
kvm@...r.kernel.org, Kai Huang <kai.huang@...el.com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>, Yan Zhao <yan.y.zhao@...el.com>,
Vishal Annapurve <vannapurve@...gle.com>, Ackerley Tng <ackerleytng@...gle.com>,
Sagi Shahar <sagis@...gle.com>, Binbin Wu <binbin.wu@...ux.intel.com>,
Xiaoyao Li <xiaoyao.li@...el.com>, Isaku Yamahata <isaku.yamahata@...el.com>
Subject: [RFC PATCH v5 39/45] KVM: TDX: Add core support for
splitting/demoting 2MiB S-EPT to 4KiB
From: Yan Zhao <yan.y.zhao@...el.com>
Add support for splitting, a.k.a. demoting, a 2MiB S-EPT hugepage to its
512 constituent 4KiB pages. As per the TDX-Module rules, first invoke
MEM.RANGE.BLOCK to put the huge S-EPTE entry into a splittable state, then
do MEM.TRACK and kick all vCPUs outside of guest mode to flush TLBs, and
finally do MEM.PAGE.DEMOTE to demote/split the huge S-EPT entry.
Assert the mmu_lock is held for write, as the BLOCK => TRACK => DEMOTE
sequence needs to be "atomic" to guarantee success (and because mmu_lock
must be held for write to use tdh_do_no_vcpus()).
Note, even with kvm->mmu_lock held for write, tdh_mem_page_demote() may
contend with tdh_vp_enter() and potentially with the guest's S-EPT entry
operations. Therefore, wrap the call with tdh_do_no_vcpus() to kick other
vCPUs out of the guest and prevent tdh_vp_enter() to ensure success.
Signed-off-by: Xiaoyao Li <xiaoyao.li@...el.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
[sean: wire up via tdx_sept_link_private_spt(), massage changelog]
Signed-off-by: Sean Christopherson <seanjc@...gle.com>
---
arch/x86/kvm/vmx/tdx.c | 51 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 48 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index e90610540a0b..af63364c8713 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1776,6 +1776,52 @@ static struct page *tdx_spte_to_external_spt(struct kvm *kvm, gfn_t gfn,
return virt_to_page(sp->external_spt);
}
+/*
+ * Split a huge mapping into the target level. Currently only supports 2MiB
+ * mappings (KVM doesn't yet support 1GiB mappings for TDX guests).
+ *
+ * Invoke "BLOCK + TRACK + kick off vCPUs (inside tdx_track())" since DEMOTE
+ * now does not support yet the NON-BLOCKING-RESIZE feature. No UNBLOCK is
+ * needed after a successful DEMOTE.
+ *
+ * Under write mmu_lock, kick off all vCPUs (inside tdh_do_no_vcpus()) to ensure
+ * DEMOTE will succeed on the second invocation if the first invocation returns
+ * BUSY.
+ */
+static int tdx_sept_split_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte,
+ u64 new_spte, enum pg_level level)
+{
+ struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
+ struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+ gpa_t gpa = gfn_to_gpa(gfn);
+ u64 err, entry, level_state;
+ struct page *external_spt;
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ external_spt = tdx_spte_to_external_spt(kvm, gfn, new_spte, level);
+ if (!external_spt)
+ return -EIO;
+
+ if (KVM_BUG_ON(!vcpu || vcpu->kvm != kvm, kvm))
+ return -EIO;
+
+ err = tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa,
+ level, &entry, &level_state);
+ if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm))
+ return -EIO;
+
+ tdx_track(kvm);
+
+ err = tdh_do_no_vcpus(tdh_mem_page_demote, kvm, &kvm_tdx->td, gpa,
+ level, spte_to_pfn(old_spte), external_spt,
+ &to_tdx(vcpu)->pamt_cache, &entry, &level_state);
+ if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_DEMOTE, entry, level_state, kvm))
+ return -EIO;
+
+ return 0;
+}
+
static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, u64 new_spte,
enum pg_level level)
{
@@ -1853,9 +1899,8 @@ static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte,
u64 new_spte, enum pg_level level)
{
- /* TODO: Support replacing huge SPTE with non-leaf SPTE. (a.k.a. demotion). */
- if (KVM_BUG_ON(is_shadow_present_pte(old_spte) && is_shadow_present_pte(new_spte), kvm))
- return -EIO;
+ if (is_shadow_present_pte(old_spte) && is_shadow_present_pte(new_spte))
+ return tdx_sept_split_private_spte(kvm, gfn, old_spte, new_spte, level);
else if (is_shadow_present_pte(old_spte))
return tdx_sept_remove_private_spte(kvm, gfn, old_spte, level);
--
2.53.0.rc1.217.geba53bf80e-goog
Powered by blists - more mailing lists