[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aTouJ0SaFp9pluUN@yzhao56-desk.sh.intel.com>
Date: Thu, 11 Dec 2025 10:36:23 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: Sagi Shahar <sagis@...gle.com>, <pbonzini@...hat.com>,
<seanjc@...gle.com>, <linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
<x86@...nel.org>, <rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>,
<kas@...nel.org>, <tabba@...gle.com>, <ackerleytng@...gle.com>,
<quic_eberman@...cinc.com>, <michael.roth@....com>, <david@...hat.com>,
<vbabka@...e.cz>, <thomas.lendacky@....com>, <pgonda@...gle.com>,
<zhiquan1.li@...el.com>, <fan.du@...el.com>, <jun.miao@...el.com>,
<ira.weiny@...el.com>, <isaku.yamahata@...el.com>, <xiaoyao.li@...el.com>,
<binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used
in split path
On Wed, Dec 10, 2025 at 05:42:44PM -0800, Vishal Annapurve wrote:
> On Sun, Dec 7, 2025 at 9:51 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > > index 6b6c46c27390..508b133df903 100644
> > > > --- a/arch/x86/include/asm/kvm_host.h
> > > > +++ b/arch/x86/include/asm/kvm_host.h
> > > > @@ -1591,6 +1591,8 @@ struct kvm_arch {
> > > > #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> > > > struct kvm_mmu_memory_cache split_desc_cache;
> > > >
> > > > + struct kvm_mmu_memory_cache pamt_page_cache;
> > > > +
> > >
> > > The latest DPAMT patches use a per-vcpu tdx_prealloc struct to handle
> > > preallocating pages for pamt. I'm wondering if you've considered how
> > > this would work here since some of the calls requiring pamt originate
> > > from user space ioctls and therefore are not associated with a vcpu.
> > I'll use a per-VM tdx_prealloc struct for splitting here, similar to the
> > per-VM pamt_page_cache.
> >
> > diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> > index 43dd295b7fd6..91bea25da528 100644
> > --- a/arch/x86/kvm/vmx/tdx.h
> > +++ b/arch/x86/kvm/vmx/tdx.h
> > @@ -48,6 +48,9 @@ struct kvm_tdx {
> > * Set/unset is protected with kvm->mmu_lock.
> > */
> > bool wait_for_sept_zap;
> > +
> > + spinlock_t external_kvm_split_cache_lock;
> > + struct tdx_prealloc prealloc_split_cache;
> > };
> >
> > > Since the tdx_prealloc is a per vcpu struct there are no race issues
> > > when multiple vcpus need to add pamt pages but here it would be
> > > trickier here because theoretically, multiple threads could split
> > > different pages simultaneously.
> > A spin lock external_kvm_split_cache_lock is introduced to protect the cache
> > enqueue and dequeue.
> > (a) When tdp_mmu_split_huge_pages_root() is invoked under write mmu_lock:
> > - Since cache dequeue is already under write mmu_lock in
> > tdp_mmu_split_huge_page()-->tdx_sept_split_private_spte(), acquiring/
> > releasing another spin lock doesn't matter.
> > - Though the cache enqueue in topup_external_split_cache() is not protected
> > by mmu_lock, protecting enqueue with a spinlock should not reduce
> > concurrency.
>
> Even with the spin lock protecting the cache topup/consumption
> operation, is it possible that one split operation context consumes
> the top-up performed by the other split operation causing failure with
> the subsequent consumptions?
The sequence of check topup, topup, and consume is like this
1. write_lock(&kvm->mmu_lock)
check topup
2. write_unlock(&kvm->mmu_lock)
topup (get/put split lock to enqueue)
3. write_lock(&kvm->mmu_lock)
check topup (goto 2 if topup is necessary) (*)
get split lock
consume
put split lock
write_unlock(&kvm->mmu_lock)
Note: due to the "iter.yielded = true" and "continue" after the topup, (see my
posted diff in last reply), consuming does not directly follow the topup. i.e.,
there is step 3.
Due to (*) in step 3, and the consuming is under write mmu_lock, it's impossible
for splits in other threads to consume pages allocated for this split.
> > (b) When tdp_mmu_split_huge_pages_root() is invoked under read mmu_lock:
> > Introducing a new spinlock may hurt concurrency for a brief duration (which
> > is necessary).
Let's talk more about this future potential use cases (i.e., if there're
multiple callers of tdp_mmu_split_huge_pages_root() under shared mmu_lock).
The sequence would be
1. read_lock(&kvm->mmu_lock)
check topup
2. read_unlock(&kvm->mmu_lock)
topup (get/put split lock to enqueue)
3. read_lock(&kvm->mmu_lock)
check topup (goto 2 if topup is necessary)
get split lock
check topup (return retry if topup is necessary) (**)
consume
put split lock
read_unlock(&kvm->mmu_lock)
Due to (**) in step 3, and the consuming is under split lock and read mmu_lock,
it's also impossible for splits in other threads to consume pages allocated for
this split.
> > However, there's no known (future) use case for multiple threads invoking
> > tdp_mmu_split_huge_pages_root() on mirror root under shared mmu_lock.
> >
> > For future splitting under shared mmu_lock in the fault path, we'll use
> > the per-vCPU tdx_prealloc instead of the per-VM cache. TDX can leverage
> > kvm_get_running_vcpu() to differentiate between the two caches.
> >
> >
Powered by blists - more mailing lists