linux-kernel - Re: [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used in split path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhR5DF=Yzb6ThiLDtktiOnAG3n+u9jZZahJiuUFR9JFCsDw0A@mail.gmail.com>
Date: Fri, 5 Dec 2025 00:14:46 -0600
From: Sagi Shahar <sagis@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: pbonzini@...hat.com, seanjc@...gle.com, linux-kernel@...r.kernel.org, 
	kvm@...r.kernel.org, x86@...nel.org, rick.p.edgecombe@...el.com, 
	dave.hansen@...el.com, kas@...nel.org, tabba@...gle.com, 
	ackerleytng@...gle.com, quic_eberman@...cinc.com, michael.roth@....com, 
	david@...hat.com, vannapurve@...gle.com, vbabka@...e.cz, 
	thomas.lendacky@....com, pgonda@...gle.com, zhiquan1.li@...el.com, 
	fan.du@...el.com, jun.miao@...el.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, binbin.wu@...ux.intel.com, 
	chao.p.peng@...el.com
Subject: Re: [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used
 in split path

On Thu, Aug 7, 2025 at 4:48 AM Yan Zhao <yan.y.zhao@...el.com> wrote:
>
> From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
>
> Preallocate a page to be used in the split_external_spt() path.
>
> Kernel needs one PAMT page pair for external_spt and one that provided
> directly to the TDH.MEM.PAGE.DEMOTE SEAMCALL.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> Co-developed-by: Yan Zhao <yan.y.zhao@...el.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> ---
> RFC v2:
> - Pulled from
>   git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git tdx/dpamt-huge.
> - Implemented the flow of topup pamt_page_cache in
>   tdp_mmu_split_huge_pages_root() (Yan)
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/mmu/mmu.c          |  1 +
>  arch/x86/kvm/mmu/tdp_mmu.c      | 51 +++++++++++++++++++++++++++++++++
>  3 files changed, 54 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6b6c46c27390..508b133df903 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1591,6 +1591,8 @@ struct kvm_arch {
>  #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
>         struct kvm_mmu_memory_cache split_desc_cache;
>
> +       struct kvm_mmu_memory_cache pamt_page_cache;
> +

The latest DPAMT patches use a per-vcpu tdx_prealloc struct to handle
preallocating pages for pamt. I'm wondering if you've considered how
this would work here since some of the calls requiring pamt originate
from user space ioctls and therefore are not associated with a vcpu.

Since the tdx_prealloc is a per vcpu struct there are no race issues
when multiple vcpus need to add pamt pages but here it would be
trickier here because theoretically, multiple threads could split
different pages simultaneously.

>         gfn_t gfn_direct_bits;
>
>         /*
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f23d8fc59323..e581cee37f64 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6848,6 +6848,7 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm)
>         kvm_mmu_free_memory_cache(&kvm->arch.split_desc_cache);
>         kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache);
>         kvm_mmu_free_memory_cache(&kvm->arch.split_shadow_page_cache);
> +       kvm_mmu_free_memory_cache(&kvm->arch.pamt_page_cache);
>  }
>
>  void kvm_mmu_uninit_vm(struct kvm *kvm)
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index eb758aaa4374..064c4e823658 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1584,6 +1584,27 @@ static bool iter_cross_boundary(struct tdp_iter *iter, gfn_t start, gfn_t end)
>                  (iter->gfn + KVM_PAGES_PER_HPAGE(iter->level)) <= end);
>  }
>
> +static bool need_topup_mirror_caches(struct kvm *kvm)
> +{
> +       int nr = tdx_nr_pamt_pages() * 2;
> +
> +       return kvm_mmu_memory_cache_nr_free_objects(&kvm->arch.pamt_page_cache) < nr;
> +}
> +
> +static int topup_mirror_caches(struct kvm *kvm)
> +{
> +       int r, nr;
> +
> +       /* One for external_spt, one for TDH.MEM.PAGE.DEMOTE */
> +       nr = tdx_nr_pamt_pages() * 2;
> +
> +       r = kvm_mmu_topup_memory_cache(&kvm->arch.pamt_page_cache, nr);
> +       if (r)
> +               return r;
> +
> +       return 0;
> +}
> +
>  static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
>                                          struct kvm_mmu_page *root,
>                                          gfn_t start, gfn_t end,
> @@ -1656,6 +1677,36 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
>                         continue;
>                 }
>
> +               if (is_mirror_sp(root) && need_topup_mirror_caches(kvm)) {
> +                       int r;
> +
> +                       rcu_read_unlock();
> +
> +                       if (shared)
> +                               read_unlock(&kvm->mmu_lock);
> +                       else
> +                               write_unlock(&kvm->mmu_lock);
> +
> +                       r = topup_mirror_caches(kvm);
> +
> +                       if (shared)
> +                               read_lock(&kvm->mmu_lock);
> +                       else
> +                               write_lock(&kvm->mmu_lock);
> +
> +                       if (r) {
> +                               trace_kvm_mmu_split_huge_page(iter.gfn,
> +                                                             iter.old_spte,
> +                                                             iter.level, r);
> +                               return r;
> +                       }
> +
> +                       rcu_read_lock();
> +
> +                       iter.yielded = true;
> +                       continue;
> +               }
> +
>                 tdp_mmu_init_child_sp(sp, &iter);
>
>                 if (tdp_mmu_split_huge_page(kvm, &iter, sp, shared))
> --
> 2.43.2
>
>