linux-kernel - Re: [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used in split path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGtprH8hk0akN1pbxO95O_AyHf-BDN5tOH-Lbxg-qcTMg-27UQ@mail.gmail.com>
Date: Wed, 10 Dec 2025 17:42:44 -0800
From: Vishal Annapurve <vannapurve@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Sagi Shahar <sagis@...gle.com>, pbonzini@...hat.com, seanjc@...gle.com, 
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org, x86@...nel.org, 
	rick.p.edgecombe@...el.com, dave.hansen@...el.com, kas@...nel.org, 
	tabba@...gle.com, ackerleytng@...gle.com, quic_eberman@...cinc.com, 
	michael.roth@....com, david@...hat.com, vbabka@...e.cz, 
	thomas.lendacky@....com, pgonda@...gle.com, zhiquan1.li@...el.com, 
	fan.du@...el.com, jun.miao@...el.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, binbin.wu@...ux.intel.com, 
	chao.p.peng@...el.com
Subject: Re: [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used
 in split path

On Sun, Dec 7, 2025 at 9:51 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
>
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 6b6c46c27390..508b133df903 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1591,6 +1591,8 @@ struct kvm_arch {
> > >  #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> > >         struct kvm_mmu_memory_cache split_desc_cache;
> > >
> > > +       struct kvm_mmu_memory_cache pamt_page_cache;
> > > +
> >
> > The latest DPAMT patches use a per-vcpu tdx_prealloc struct to handle
> > preallocating pages for pamt. I'm wondering if you've considered how
> > this would work here since some of the calls requiring pamt originate
> > from user space ioctls and therefore are not associated with a vcpu.
> I'll use a per-VM tdx_prealloc struct for splitting here, similar to the
> per-VM pamt_page_cache.
>
> diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> index 43dd295b7fd6..91bea25da528 100644
> --- a/arch/x86/kvm/vmx/tdx.h
> +++ b/arch/x86/kvm/vmx/tdx.h
> @@ -48,6 +48,9 @@ struct kvm_tdx {
>          * Set/unset is protected with kvm->mmu_lock.
>          */
>         bool wait_for_sept_zap;
> +
> +       spinlock_t external_kvm_split_cache_lock;
> +       struct tdx_prealloc prealloc_split_cache;
>  };
>
> > Since the tdx_prealloc is a per vcpu struct there are no race issues
> > when multiple vcpus need to add pamt pages but here it would be
> > trickier here because theoretically, multiple threads could split
> > different pages simultaneously.
> A spin lock external_kvm_split_cache_lock is introduced to protect the cache
> enqueue and dequeue.
> (a) When tdp_mmu_split_huge_pages_root() is invoked under write mmu_lock:
>     - Since cache dequeue is already under write mmu_lock in
>       tdp_mmu_split_huge_page()-->tdx_sept_split_private_spte(), acquiring/
>       releasing another spin lock doesn't matter.
>     - Though the cache enqueue in topup_external_split_cache() is not protected
>       by mmu_lock, protecting enqueue with a spinlock should not reduce
>       concurrency.

Even with the spin lock protecting the cache topup/consumption
operation, is it possible that one split operation context consumes
the top-up performed by the other split operation causing failure with
the subsequent consumptions?

>
> (b) When tdp_mmu_split_huge_pages_root() is invoked under read mmu_lock:
>     Introducing a new spinlock may hurt concurrency for a brief duration (which
>     is necessary).
>     However, there's no known (future) use case for multiple threads invoking
>     tdp_mmu_split_huge_pages_root() on mirror root under shared mmu_lock.
>
>     For future splitting under shared mmu_lock in the fault path, we'll use
>     the per-vCPU tdx_prealloc instead of the per-VM cache. TDX can leverage
>     kvm_get_running_vcpu() to differentiate between the two caches.
>
>