[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9018c8629c921fae9ee993cd83b5a189616f51b0.camel@intel.com>
Date: Sun, 28 Sep 2025 22:56:25 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-coco@...ts.linux.dev"
<linux-coco@...ts.linux.dev>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"seanjc@...gle.com" <seanjc@...gle.com>, "mingo@...hat.com"
<mingo@...hat.com>, "kas@...nel.org" <kas@...nel.org>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
"tglx@...utronix.de" <tglx@...utronix.de>, "Annapurve, Vishal"
<vannapurve@...gle.com>, "Gao, Chao" <chao.gao@...el.com>, "Edgecombe, Rick
P" <rick.p.edgecombe@...el.com>, "bp@...en8.de" <bp@...en8.de>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v3 12/16] x86/virt/tdx: Add helpers to allow for
pre-allocating pages
On Fri, 2025-09-26 at 23:47 +0000, Edgecombe, Rick P wrote:
> On Mon, 2025-09-22 at 11:20 +0000, Huang, Kai wrote:
> > Since 'struct tdx_prealloc' replaces the KVM standard 'struct
> > kvm_mmu_memory_cache' for external page table, and it is allowed to
> > fail in "topup" operation, why not just call tdx_alloc_page() to
> > "topup" page for external page table here?
>
> I sympathize with the intuition. It would be nice to just prep
> everything and then operate on it like normal.
>
> We want this to not have to be totally redone for huge pages. In the
> huge pages case, we could do this approach for the page tables, but for
> the private page itself, we don't know whether we need 4KB PAMT backing
> or not. So we don't fully know whether a TDX private page needs PAMT
> 4KB backing or not before the fault.
>
> So we would need, like, separate pools for page tables and private
> pages.
>
The private page itself comes from the guest_memfd via page cache, so we
cannot use tdx_alloc_page() for it anyway but need to use tdx_pamt_get()
explicitly.
I don't know all details of exactly what is the interaction with huge page
here -- my imagine would be we only call tdx_pamt_get() when we found the
the private page is 4K but not 2M [*], but I don't see how this conflicts
with using tdx_alloc_page() for page table itself.
The point is page table itself is always 4K, therefore it has no
difference from control pages.
[*] this is actually an optimization but not a must for supporting
hugepage with DPAMT AFAICT. Theoretically, we can always allocate DPAMT
pages upfront for hugepage at allocation time in guest_memfd regardless
whether it is actually mapped as hugepage in S-EPT, and we can free DPAMT
pages when we promote 4K pages to a hugepage in S-EPT to effectively
reduce DPAMT pages for hugepage.
> Or someway to unwind the wrong guess of small page. At that
> point I don't think it's simpler.
>
> >
> > I don't think we need to keep all "DPAMT pages" in the pool, right?
>
> Not sure what you mean by this.
I mean we don't need to keep DPAMT pages in the list of 'struct
tdx_prealloc'. tdx_pamt_put() get the DPAMT pages from the TDX module and
just frees them.
>
> >
> > If tdx_alloc_page() succeeds, then the "DPAMT pages" are also
> > "topup"ed, and PAMT entries for the 2M range of the SEPT page is
> > ready too.
> >
> > This at least avoids having to export tdx_dpamt_entry_pages(), which
> > is not nice IMHO. And I think it should yield simpler code.
>
> I mean less exports is better, but I don't follow what is so egregious.
> It's not called from core KVM code.
>
> >
> > One more thinking:
> >
> > I also have been thinking whether we can continue to use the KVM
> > standard 'struct kvm_mmu_memory_cache' for S-EPT pages. Below is one
> > more idea for your reference.
>
> The point of the new struct was to hand it to the arch/x86 side of the
> house. If we don't need to do that, then yes we could have options. And
> Dave suggested another struct that could be used to hand off the cache.
>
> >
> > In the previous discussion I think we concluded the 'kmem_cache'
> > doesn't work nicely with DPAMT (due to the ctor() cannot fail etc).
> > And when we don't use 'kmem_cache', KVM just call __get_free_page()
> > to topup objects.
> > But we need tdx_alloc_page() for allocation here, so this is the
> > problem.
> >
> > If we add two callbacks for object allocation/free to 'struct
> > kvm_mmu_memory_cache', then we can have place to hook
> > tdx_alloc_page().
>
> kvm_mmu_memory_cache has a lot of options at this point. All we really
> need is a list. I'm not sure it makes sense to keep cramming things
> into it?
It comes down to whether we want to continue to reuse
'kvm_mmu_memory_cache' (which is already implemented in KVM), or we want
to use a different infrastructure for tracking S-EPT pages.
Anyway just my 2cents for your reference.
Powered by blists - more mailing lists