linux-kernel - Re: [PATCH v3 12/16] x86/virt/tdx: Add helpers to allow for pre-allocating pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4a3b48707b896d78b1cfa96ee5c20bece42b9503.camel@intel.com>
Date: Mon, 29 Sep 2025 12:10:48 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "kas@...nel.org"
	<kas@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>, "mingo@...hat.com"
	<mingo@...hat.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "Annapurve, Vishal" <vannapurve@...gle.com>, "Gao,
 Chao" <chao.gao@...el.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	"bp@...en8.de" <bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v3 12/16] x86/virt/tdx: Add helpers to allow for
 pre-allocating pages

On Sun, 2025-09-28 at 22:56 +0000, Huang, Kai wrote:
> On Fri, 2025-09-26 at 23:47 +0000, Edgecombe, Rick P wrote:
> > On Mon, 2025-09-22 at 11:20 +0000, Huang, Kai wrote:
> > > Since 'struct tdx_prealloc' replaces the KVM standard 'struct
> > > kvm_mmu_memory_cache' for external page table, and it is allowed to
> > > fail in "topup" operation, why not just call tdx_alloc_page() to
> > > "topup" page for external page table here?
> > 
> > I sympathize with the intuition. It would be nice to just prep
> > everything and then operate on it like normal.
> > 
> > We want this to not have to be totally redone for huge pages. In the
> > huge pages case, we could do this approach for the page tables, but for
> > the private page itself, we don't know whether we need 4KB PAMT backing
> > or not. So we don't fully know whether a TDX private page needs PAMT
> > 4KB backing or not before the fault.
> > 
> > So we would need, like, separate pools for page tables and private
> > pages. 
> > 
> 
> The private page itself comes from the guest_memfd via page cache, so we
> cannot use tdx_alloc_page() for it anyway but need to use tdx_pamt_get()
> explicitly.
> 
> I don't know all details of exactly what is the interaction with huge page
> here -- my imagine would be we only call tdx_pamt_get() when we found the
> the private page is 4K but not 2M [*], but I don't see how this conflicts
> with using tdx_alloc_page() for page table itself.
> 
> The point is page table itself is always 4K, therefore it has no
> difference from control pages.
> 
> [*] this is actually an optimization but not a must for supporting
> hugepage with DPAMT AFAICT.  Theoretically, we can always allocate DPAMT
> pages upfront for hugepage at allocation time in guest_memfd regardless
> whether it is actually mapped as hugepage in S-EPT, and we can free DPAMT
> pages when we promote 4K pages to a hugepage in S-EPT to effectively
> reduce DPAMT pages for hugepage.

After second thought, please ignore the [*], since I am not sure whether
allocating DPAMT pages for hugepage upfront is a good idea.  I suppose
hugepage and 4K pages can convert to each other at runtime, so perhaps
it's better to handle DPAMT pages when KVM actually maps the leaf TDX
private page.

So seems it's inevitable we need to manage a pool for the leaf TDX private
pages -- for hugepage support at least, since w/o hugepage we may avoid
this pool (e.g., theoretically, we could do tdx_pamt_get() after
kvm_mmu_faultin_pfn() where fault->pfn is ready).

Given you said "We want this to not have to be totally redone for huge
pages.", I can see why you want to use a single pool for both page table
and the leaf TDX private pages.

But maybe another strategy is we could use simplest way for the initial
DPAMT support w/o huge page support first, and leave this to the hugepage
support series.  I don't know.  Just my 2cents.