[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXqPym6QSFQTn9Cy@google.com>
Date: Wed, 28 Jan 2026 14:38:02 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: pbonzini@...hat.com, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
x86@...nel.org, rick.p.edgecombe@...el.com, dave.hansen@...el.com,
kas@...nel.org, tabba@...gle.com, ackerleytng@...gle.com,
michael.roth@....com, david@...nel.org, vannapurve@...gle.com,
sagis@...gle.com, vbabka@...e.cz, thomas.lendacky@....com,
nik.borisov@...e.com, pgonda@...gle.com, fan.du@...el.com, jun.miao@...el.com,
francescolavra.fl@...il.com, jgross@...e.com, ira.weiny@...el.com,
isaku.yamahata@...el.com, xiaoyao.li@...el.com, kai.huang@...el.com,
binbin.wu@...ux.intel.com, chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 07/24] KVM: x86/tdp_mmu: Introduce split_external_spte()
under write mmu_lock
On Tue, Jan 06, 2026, Yan Zhao wrote:
> Introduce a new valid transition case for splitting and document all valid
> transitions of the mirror page table under write mmu_lock in
> tdp_mmu_set_spte().
...
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 4 ++++
> arch/x86/kvm/mmu/tdp_mmu.c | 29 +++++++++++++++++++++++++----
> 3 files changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 58c5c9b082ca..84fa8689b45c 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -98,6 +98,7 @@ KVM_X86_OP_OPTIONAL(link_external_spt)
> KVM_X86_OP_OPTIONAL(set_external_spte)
> KVM_X86_OP_OPTIONAL(free_external_spt)
> KVM_X86_OP_OPTIONAL(remove_external_spte)
> +KVM_X86_OP_OPTIONAL(split_external_spte)
This is all going in the wrong direction. Sprinking S-EPT callbacks all over the
TDP MMU leaks *more* TDX details into the MMU, and for all intents and purposes
does nothing in terms of encapsulating MMU details in the MMU. E.g. the TDX code
has sanity checks all over the place to ensure the "right" API is called.
The bajillion callbacks also make this code extremely difficult to follow and
review. It requires knowing exactly which TDP MMU paths are used for what
operations, and what paths are (allegedly) unreachable for mirror roots. Adding
hooks at specific points is also brittle, because an unexpected update/change is
more likely to go unnoticed, at least until the system explodes.
There are really only two novel paths: atomic versus non-atomic writes. An atomic
set_spte() can fail, and also needs special handling so that the entire operation
is atomic from KVM's point of view.
There's another outlier, removal of a non-leaf S-EPT page, that I think is worth
keeping separate, because I don't see a sane way of containing the TDX-Module's
ordering requirements to the TDX code. Specifically, the TDX-Module requires that
leaf S-EPT entries be removed before the parent page table can be removed, whereas
KVM prefers to prune the page table and _then_ reap its children.
We _could_ funnel that case into the non-atomic update, but it would either require
propagating the non-leaf removal to TDX after the call_rcu():
call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback);
which is all kinds of gross, or it would require moving the call_rcu() invocation,
which obviously bleeds TDX details into the TDP MMU. So I think it's work keeping
a dedicated hook for that case, but literally everything else can funnel into a
single callback, invoked from two locations: handle_changed_spte() and
__tdp_mmu_set_spte_atomic().
Then the TDX code is (quite simply, IMO):
static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte,
u64 new_spte, enum pg_level level)
{
if (is_shadow_present_pte(old_spte) && is_shadow_present_pte(new_spte))
return tdx_sept_split_private_spte(kvm, gfn, old_spte, new_spte, level);
else if (is_shadow_present_pte(old_spte))
return tdx_sept_remove_private_spte(kvm, gfn, old_spte, level);
if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm))
return -EIO;
if (!is_last_spte(new_spte, level))
return tdx_sept_link_private_spt(kvm, gfn, new_spte, level);
return tdx_sept_map_leaf_spte(kvm, gfn, new_spte, level);
}
Powered by blists - more mailing lists