[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aNJGP6lwO9WOqjfh@yzhao56-desk.sh.intel.com>
Date: Tue, 23 Sep 2025 15:03:27 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Rick Edgecombe <rick.p.edgecombe@...el.com>
CC: <kas@...nel.org>, <bp@...en8.de>, <chao.gao@...el.com>,
<dave.hansen@...ux.intel.com>, <isaku.yamahata@...el.com>,
<kai.huang@...el.com>, <kvm@...r.kernel.org>, <linux-coco@...ts.linux.dev>,
<linux-kernel@...r.kernel.org>, <mingo@...hat.com>, <pbonzini@...hat.com>,
<seanjc@...gle.com>, <tglx@...utronix.de>, <x86@...nel.org>,
<vannapurve@...gle.com>
Subject: Re: [PATCH v3 11/16] KVM: TDX: Add x86 ops for external spt cache
On Thu, Sep 18, 2025 at 04:22:19PM -0700, Rick Edgecombe wrote:
> Move mmu_external_spt_cache behind x86 ops.
>
> In the mirror/external MMU concept, the KVM MMU manages a non-active EPT
> tree for private memory (the mirror). The actual active EPT tree the
> private memory is protected inside the TDX module. Whenever the mirror EPT
> is changed, it needs to call out into one of a set of x86 opts that
> implement various update operation with TDX specific SEAMCALLs and other
> tricks. These implementations operate on the TDX S-EPT (the external).
>
> In reality these external operations are designed narrowly with respect to
> TDX particulars. On the surface, what TDX specific things are happening to
> fulfill these update operations are mostly hidden from the MMU, but there
> is one particular area of interest where some details leak through.
>
> The S-EPT needs pages to use for the S-EPT page tables. These page tables
> need to be allocated before taking the mmu lock, like all the rest. So the
> KVM MMU pre-allocates pages for TDX to use for the S-EPT in the same place
> where it pre-allocates the other page tables. It’s not too bad and fits
> nicely with the others.
>
> However, Dynamic PAMT will need even more pages for the same operations.
> Further, these pages will need to be handed to the arch/86 side which used
> them for DPAMT updates, which is hard for the existing KVM based cache.
> The details living in core MMU code start to add up.
>
> So in preparation to make it more complicated, move the external page
> table cache into TDX code by putting it behind some x86 ops. Have one for
> topping up and one for allocation. Don’t go so far to try to hide the
> existence of external page tables completely from the generic MMU, as they
> are currently stores in their mirror struct kvm_mmu_page and it’s quite
> handy.
>
> To plumb the memory cache operations through tdx.c, export some of
> the functions temporarily. This will be removed in future changes.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> ---
> v3:
> - New patch
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 2 ++
> arch/x86/include/asm/kvm_host.h | 11 ++++++-----
> arch/x86/kvm/mmu/mmu.c | 4 +---
> arch/x86/kvm/mmu/mmu_internal.h | 2 +-
> arch/x86/kvm/vmx/tdx.c | 17 +++++++++++++++++
> arch/x86/kvm/vmx/tdx.h | 2 ++
> virt/kvm/kvm_main.c | 2 ++
> 7 files changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index 62c3e4de3303..a4e4c1333224 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -98,6 +98,8 @@ KVM_X86_OP_OPTIONAL(link_external_spt)
> KVM_X86_OP_OPTIONAL(set_external_spte)
> KVM_X86_OP_OPTIONAL(free_external_spt)
> KVM_X86_OP_OPTIONAL(remove_external_spte)
> +KVM_X86_OP_OPTIONAL(alloc_external_fault_cache)
> +KVM_X86_OP_OPTIONAL(topup_external_fault_cache)
> KVM_X86_OP(has_wbinvd_exit)
> KVM_X86_OP(get_l2_tsc_offset)
> KVM_X86_OP(get_l2_tsc_multiplier)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index cb86f3cca3e9..e4cf0f40c757 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -855,11 +855,6 @@ struct kvm_vcpu_arch {
> struct kvm_mmu_memory_cache mmu_shadow_page_cache;
> struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
> struct kvm_mmu_memory_cache mmu_page_header_cache;
> - /*
> - * This cache is to allocate external page table. E.g. private EPT used
> - * by the TDX module.
> - */
> - struct kvm_mmu_memory_cache mmu_external_spt_cache;
>
> /*
> * QEMU userspace and the guest each have their own FPU state.
> @@ -1856,6 +1851,12 @@ struct kvm_x86_ops {
> int (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> kvm_pfn_t pfn_for_gfn);
>
> + /* Allocation a pages from the external page cache. */
> + void *(*alloc_external_fault_cache)(struct kvm_vcpu *vcpu);
> +
> + /* Top up extra pages needed for faulting in external page tables. */
> + int (*topup_external_fault_cache)(struct kvm_vcpu *vcpu);
> +
> bool (*has_wbinvd_exit)(void);
>
> u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 55335dbd70ce..b3feaee893b2 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -601,8 +601,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
> if (r)
> return r;
> if (kvm_has_mirrored_tdp(vcpu->kvm)) {
> - r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_external_spt_cache,
> - PT64_ROOT_MAX_LEVEL);
> + r = kvm_x86_call(topup_external_fault_cache)(vcpu);
> if (r)
> return r;
> }
> @@ -625,7 +624,6 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
> - kvm_mmu_free_memory_cache(&vcpu->arch.mmu_external_spt_cache);
Though pre-allocated pages are eventually freed in tdx_vcpu_free() in patch 13,
looks they are leaked in this patch.
BTW, why not invoke kvm_x86_call(free_external_fault_cache)(vcpu) here?
It looks more natural to free the remaining pre-allocated pages in
mmu_free_memory_caches(), which is invoked after kvm_mmu_unload(vcpu) while
tdx_vcpu_free() is before it.
> kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
> }
>
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index ed5c01df21ba..1fa94ab100be 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -175,7 +175,7 @@ static inline void kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struct kvm_
> * Therefore, KVM does not need to initialize or access external_spt.
> * KVM only interacts with sp->spt for private EPT operations.
> */
> - sp->external_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_spt_cache);
> + sp->external_spt = kvm_x86_call(alloc_external_fault_cache)(vcpu);
> }
>
> static inline gfn_t kvm_gfn_root_bits(const struct kvm *kvm, const struct kvm_mmu_page *root)
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index dd2be7bedd48..6c9e11be9705 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1590,6 +1590,21 @@ static void tdx_unpin(struct kvm *kvm, struct page *page)
> put_page(page);
> }
>
> +static void *tdx_alloc_external_fault_cache(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_tdx *tdx = to_tdx(vcpu);
> +
> + return kvm_mmu_memory_cache_alloc(&tdx->mmu_external_spt_cache);
> +}
> +
> +static int tdx_topup_external_fault_cache(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_tdx *tdx = to_tdx(vcpu);
> +
> + return kvm_mmu_topup_memory_cache(&tdx->mmu_external_spt_cache,
> + PT64_ROOT_MAX_LEVEL);
> +}
> +
> static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
> enum pg_level level, struct page *page)
> {
> @@ -3647,4 +3662,6 @@ void __init tdx_hardware_setup(void)
> vt_x86_ops.free_external_spt = tdx_sept_free_private_spt;
> vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte;
> vt_x86_ops.protected_apic_has_interrupt = tdx_protected_apic_has_interrupt;
> + vt_x86_ops.topup_external_fault_cache = tdx_topup_external_fault_cache;
> + vt_x86_ops.alloc_external_fault_cache = tdx_alloc_external_fault_cache;
> }
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index f4e609a745ee..cd7993ef056e 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -70,6 +70,8 @@ struct vcpu_tdx {
>
> u64 map_gpa_next;
> u64 map_gpa_end;
> +
> + struct kvm_mmu_memory_cache mmu_external_spt_cache;
> };
>
> void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 err);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fee108988028..f05e6d43184b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -404,6 +404,7 @@ int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
> {
> return __kvm_mmu_topup_memory_cache(mc, KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE, min);
> }
> +EXPORT_SYMBOL_GPL(kvm_mmu_topup_memory_cache);
>
> int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc)
> {
> @@ -436,6 +437,7 @@ void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
> BUG_ON(!p);
> return p;
> }
> +EXPORT_SYMBOL_GPL(kvm_mmu_memory_cache_alloc);
> #endif
>
> static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
> --
> 2.51.0
>
Powered by blists - more mailing lists