[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aTZmY1P6uq8KWwKr@yzhao56-desk.sh.intel.com>
Date: Mon, 8 Dec 2025 13:49:15 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sagi Shahar <sagis@...gle.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>,
<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>, <kas@...nel.org>,
<tabba@...gle.com>, <ackerleytng@...gle.com>, <quic_eberman@...cinc.com>,
<michael.roth@....com>, <david@...hat.com>, <vannapurve@...gle.com>,
<vbabka@...e.cz>, <thomas.lendacky@....com>, <pgonda@...gle.com>,
<zhiquan1.li@...el.com>, <fan.du@...el.com>, <jun.miao@...el.com>,
<ira.weiny@...el.com>, <isaku.yamahata@...el.com>, <xiaoyao.li@...el.com>,
<binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used
in split path
On Fri, Dec 05, 2025 at 12:14:46AM -0600, Sagi Shahar wrote:
> On Thu, Aug 7, 2025 at 4:48 AM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
> >
> > Preallocate a page to be used in the split_external_spt() path.
> >
> > Kernel needs one PAMT page pair for external_spt and one that provided
> > directly to the TDH.MEM.PAGE.DEMOTE SEAMCALL.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> > Co-developed-by: Yan Zhao <yan.y.zhao@...el.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> > ---
> > RFC v2:
> > - Pulled from
> > git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git tdx/dpamt-huge.
> > - Implemented the flow of topup pamt_page_cache in
> > tdp_mmu_split_huge_pages_root() (Yan)
> > ---
> > arch/x86/include/asm/kvm_host.h | 2 ++
> > arch/x86/kvm/mmu/mmu.c | 1 +
> > arch/x86/kvm/mmu/tdp_mmu.c | 51 +++++++++++++++++++++++++++++++++
> > 3 files changed, 54 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 6b6c46c27390..508b133df903 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1591,6 +1591,8 @@ struct kvm_arch {
> > #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
> > struct kvm_mmu_memory_cache split_desc_cache;
> >
> > + struct kvm_mmu_memory_cache pamt_page_cache;
> > +
>
> The latest DPAMT patches use a per-vcpu tdx_prealloc struct to handle
> preallocating pages for pamt. I'm wondering if you've considered how
> this would work here since some of the calls requiring pamt originate
> from user space ioctls and therefore are not associated with a vcpu.
I'll use a per-VM tdx_prealloc struct for splitting here, similar to the
per-VM pamt_page_cache.
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 43dd295b7fd6..91bea25da528 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -48,6 +48,9 @@ struct kvm_tdx {
* Set/unset is protected with kvm->mmu_lock.
*/
bool wait_for_sept_zap;
+
+ spinlock_t external_kvm_split_cache_lock;
+ struct tdx_prealloc prealloc_split_cache;
};
> Since the tdx_prealloc is a per vcpu struct there are no race issues
> when multiple vcpus need to add pamt pages but here it would be
> trickier here because theoretically, multiple threads could split
> different pages simultaneously.
A spin lock external_kvm_split_cache_lock is introduced to protect the cache
enqueue and dequeue.
(a) When tdp_mmu_split_huge_pages_root() is invoked under write mmu_lock:
- Since cache dequeue is already under write mmu_lock in
tdp_mmu_split_huge_page()-->tdx_sept_split_private_spte(), acquiring/
releasing another spin lock doesn't matter.
- Though the cache enqueue in topup_external_split_cache() is not protected
by mmu_lock, protecting enqueue with a spinlock should not reduce
concurrency.
(b) When tdp_mmu_split_huge_pages_root() is invoked under read mmu_lock:
Introducing a new spinlock may hurt concurrency for a brief duration (which
is necessary).
However, there's no known (future) use case for multiple threads invoking
tdp_mmu_split_huge_pages_root() on mirror root under shared mmu_lock.
For future splitting under shared mmu_lock in the fault path, we'll use
the per-vCPU tdx_prealloc instead of the per-VM cache. TDX can leverage
kvm_get_running_vcpu() to differentiate between the two caches.
Here's the new diff in TDP MMU rebased to Dynamic PAMT v4.
+/*
+ * Check the per-VM external split cache under write mmu_lock or read mmu_lock
+ * in tdp_mmu_split_huge_pages_root().
+ *
+ * When need_topup_external_split_cache() returns true, the mmu_lock is held
+ * throughout
+ * (a) need_topup_external_split_cache(), and
+ * (b) the cache consumption (in tdx_sept_split_private_spte() called by
+ * tdp_mmu_split_huge_page()).
+ *
+ * Throughout the execution from (a) to (b):
+ * - With write mmu_lock, the per-VM external split cache is exclusively
+ * accessed by a single user. Therefore, the result returned from
+ * need_topup_external_split_cache() is accurate.
+ *
+ * - With read mmu_lock, the per-VM external split cache can be shared among
+ * multiple users. Cache consumption in tdx_sept_split_private_spte() thus
+ * needs to check again of the cache page count after acquiring its internal
+ * split cache lock and return an error if the cache page count is not
+ * sufficient.
+ */
+static bool need_topup_external_split_cache(struct kvm *kvm, int level)
+{
+ return kvm_x86_call(need_topup_external_per_vm_split_cache)(kvm, level);
+}
+static int topup_external_split_cache(struct kvm *kvm, int level, bool shared)
+{
+ int r;
+
+ rcu_read_unlock();
+
+ if (shared)
+ read_unlock(&kvm->mmu_lock);
+ else
+ write_unlock(&kvm->mmu_lock);
+
+ r = kvm_x86_call(topup_external_per_vm_split_cache)(kvm, level);
+
+ if (shared)
+ read_lock(&kvm->mmu_lock);
+ else
+ write_lock(&kvm->mmu_lock);
+
+ if (!r)
+ rcu_read_lock();
+
+ return r;
+}
static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
struct kvm_mmu_page *root,
gfn_t start, gfn_t end,
@@ -1673,6 +1723,23 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
continue;
}
+ if (is_mirror_sp(root) &&
+ need_topup_external_split_cache(kvm, iter.level)) {
+ int r;
+
+ r = topup_external_split_cache(kvm, iter.level, shared);
+
+ if (r) {
+ trace_kvm_mmu_split_huge_page(iter.gfn,
+ iter.old_spte,
+ iter.level, r);
+ return r;
+ }
+
+ iter.yielded = true;
+ continue;
+ }
+
tdp_mmu_init_child_sp(sp, &iter);
if (tdp_mmu_split_huge_page(kvm, &iter, sp, shared))
Powered by blists - more mailing lists