[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYMMHVvwDjZ7Lz9l@yzhao56-desk.sh.intel.com>
Date: Wed, 4 Feb 2026 17:06:37 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
<x86@...nel.org>, Kiryl Shutsemau <kas@...nel.org>, Paolo Bonzini
<pbonzini@...hat.com>, <linux-kernel@...r.kernel.org>,
<linux-coco@...ts.linux.dev>, <kvm@...r.kernel.org>, Kai Huang
<kai.huang@...el.com>, Rick Edgecombe <rick.p.edgecombe@...el.com>, "Vishal
Annapurve" <vannapurve@...gle.com>, Ackerley Tng <ackerleytng@...gle.com>,
Sagi Shahar <sagis@...gle.com>, Binbin Wu <binbin.wu@...ux.intel.com>,
Xiaoyao Li <xiaoyao.li@...el.com>, Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [RFC PATCH v5 08/45] KVM: x86/mmu: Propagate mirror SPTE removal
to S-EPT in handle_changed_spte()
On Wed, Jan 28, 2026 at 05:14:40PM -0800, Sean Christopherson wrote:
> Invoke .remove_external_spte() in handle_changed_spte() as appropriate
> instead of relying on callers to do the right thing. Relying on callers
> to invoke .remove_external_spte() is confusing and brittle, e.g. subtly
> relies tdp_mmu_set_spte_atomic() never removing SPTEs, and removing an
> S-EPT entry in tdp_mmu_set_spte() is bizarre (yeah, the VM is bugged so
> it doesn't matter in practice, but it's still weird).
>
> Implementing rules-based logic in a common chokepoint will also make it
> easier to reason about the correctness of splitting hugepages when support
> for S-EPT hugepages comes along.
>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 43 +++++++++++++-------------------------
> 1 file changed, 14 insertions(+), 29 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 8743cd020d12..27ac520f2a89 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -359,25 +359,6 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> }
>
> -static void remove_external_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte,
> - int level)
> -{
> - /*
> - * External (TDX) SPTEs are limited to PG_LEVEL_4K, and external
> - * PTs are removed in a special order, involving free_external_spt().
> - * But remove_external_spte() will be called on non-leaf PTEs via
> - * __tdp_mmu_zap_root(), so avoid the error the former would return
> - * in this case.
> - */
> - if (!is_last_spte(old_spte, level))
> - return;
> -
> - /* Zapping leaf spte is allowed only when write lock is held. */
> - lockdep_assert_held_write(&kvm->mmu_lock);
> -
> - kvm_x86_call(remove_external_spte)(kvm, gfn, level, old_spte);
> -}
> -
> /**
> * handle_removed_pt() - handle a page table removed from the TDP structure
> *
> @@ -473,11 +454,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> }
> handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), sptep, gfn,
> old_spte, FROZEN_SPTE, level, shared);
> -
> - if (is_mirror_sp(sp)) {
> - KVM_BUG_ON(shared, kvm);
> - remove_external_spte(kvm, gfn, old_spte, level);
> - }
> }
>
> if (is_mirror_sp(sp) &&
> @@ -590,10 +566,21 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> * the paging structure. Note the WARN on the PFN changing without the
> * SPTE being converted to a hugepage (leaf) or being zapped. Shadow
> * pages are kernel allocations and should never be migrated.
> + *
> + * When removing leaf entries from a mirror, immediately propagate the
> + * changes to the external page tables. Note, non-leaf mirror entries
> + * are handled by handle_removed_pt(), as TDX requires that all leaf
> + * entries are removed before the owning page table. Note #2, writes
> + * to make mirror PTEs shadow-present are propagated to external page
> + * tables by __tdp_mmu_set_spte_atomic(), as KVM needs to ensure the
> + * external page table was successfully updated before marking the
> + * mirror SPTE present.
> */
> if (was_present && !was_leaf &&
> (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
> handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
> + else if (was_leaf && is_mirror_sptep(sptep) && !is_leaf)
Should we check !is_present instead of !is_leaf?
e.g. a transition from a present leaf entry to a present non-leaf entry could
also trigger this if case.
Besides, need "KVM_BUG_ON(shared, kvm)" in this case.
> + kvm_x86_call(remove_external_spte)(kvm, gfn, level, old_spte);
> }
>
> static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
> @@ -725,12 +712,10 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
>
> /*
> * Users that do non-atomic setting of PTEs don't operate on mirror
> - * roots, so don't handle it and bug the VM if it's seen.
> + * roots. Bug the VM as this path doesn't propagate such writes to the
> + * external page tables.
> */
> - if (is_mirror_sptep(sptep)) {
> - KVM_BUG_ON(is_shadow_present_pte(new_spte), kvm);
> - remove_external_spte(kvm, gfn, old_spte, level);
> - }
> + KVM_BUG_ON(is_mirror_sptep(sptep) && is_shadow_present_pte(new_spte), kvm);
>
> return old_spte;
> }
> --
> 2.53.0.rc1.217.geba53bf80e-goog
>
Powered by blists - more mailing lists