linux-kernel - Re: [PATCH v4 11/16] KVM: TDX: Add x86 ops for external spt cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <24665176b1e6b169441c9f6db9b5d02d073377a4.camel@intel.com>
Date: Tue, 20 Jan 2026 19:53:19 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "Huang, Kai" <kai.huang@...el.com>, "Li,
 Xiaoyao" <xiaoyao.li@...el.com>, "Hansen, Dave" <dave.hansen@...el.com>,
	"Zhao, Yan Y" <yan.y.zhao@...el.com>, "Wu, Binbin" <binbin.wu@...el.com>,
	"kas@...nel.org" <kas@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "mingo@...hat.com" <mingo@...hat.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"Annapurve, Vishal" <vannapurve@...gle.com>, "Gao, Chao"
	<chao.gao@...el.com>, "bp@...en8.de" <bp@...en8.de>, "x86@...nel.org"
	<x86@...nel.org>
Subject: Re: [PATCH v4 11/16] KVM: TDX: Add x86 ops for external spt cache

Sean, really appreciate you taking a look despite being overbooked.

On Fri, 2026-01-16 at 16:53 -0800, Sean Christopherson wrote:
> NAK.  I kinda sorta get why you did this?  But the pages KVM uses for page tables
> are KVM's, not to be mixed with PAMT pages.
> 
> Eww.  Definitely a hard "no".  In tdp_mmu_alloc_sp_for_split(), the allocation
> comes from KVM:
> 
> 	if (mirror) {
> 		sp->external_spt = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> 		if (!sp->external_spt) {
> 			free_page((unsigned long)sp->spt);
> 			kmem_cache_free(mmu_page_header_cache, sp);
> 			return NULL;
> 		}
> 	}

Ah, this is from the TDX huge pages series. There is a bit of fallout from TDX 
/coco's eternal nemesis: stacks of code all being co-designed at once.

Dave has been directing us recently to focus on only the needs of the current
series. Now that we can test at each incremental step we don't have the same
problems as before. But of course there is still desire for updated TDX huge
pages, etc to help with development of all the other WIP stuff.

For this design aspect of how the topup caches work for DPAMT, he asked
specifically for the DPAMT patches to *not* consider how TDX huge pages will use
them.

Now the TDX huge pages coverletter asked you to look at some aspects of that,
and traditionally KVM side has preferred to look at how the code is all going to
work together. The presentation of this was a bit rushed and confused, but
looking forward, how do you want to do this?

After the 130 patches ordeal, I'm a bit amenable to Dave's view. What do you
think?

> 
> But then in kvm_tdp_mmu_map(), via kvm_mmu_alloc_external_spt(), the allocation
> comes from get_tdx_prealloc_page()
> 
>   static void *tdx_alloc_external_fault_cache(struct kvm_vcpu *vcpu)
>   {
> 	struct page *page = get_tdx_prealloc_page(&to_tdx(vcpu)->prealloc);
> 
> 	if (WARN_ON_ONCE(!page))
> 		return (void *)__get_free_page(GFP_ATOMIC | __GFP_ACCOUNT);
> 
> 	return page_address(page);
>   }
> 
> But then regardles of where the page came from, KVM frees it.  Seriously.
> 
>   static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
>   {
> 	free_page((unsigned long)sp->external_spt);  <=====

Ah! Ok, we might have a good option here. Kai had suggested that instead of pre-
faulting all the pages that we might need for DPAMT backing, we pre-install the
DPAMT backing for all pages in the external page table cache at top-up time. It
has a complication for TDX huge pages, which needs to decide late if a page is
huge and needs dpamt backing or not, but based on Dave's direction above I gave
it a try. A downside for pure DPAMT was that it needed to handle the freeing
specially because it needed to potentially reclaim the DPAMT backing. This
required an x86 op for freeing these pages, and was a bit more code on the KVM
side in general.

But if the asymmetry is a problem either way, maybe I'll give it a try for v5. 

> 	free_page((unsigned long)sp->spt);
> 	kmem_cache_free(mmu_page_header_cache, sp);
>   }
> 
> Oh, and the hugepage series also fumbles its topup (why there's yet another
> topup API, I have no idea).
> 
>   static int tdx_topup_vm_split_cache(struct kvm *kvm, enum pg_level level)
>   {
> 	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> 	struct tdx_prealloc *prealloc = &kvm_tdx->prealloc_split_cache;
> 	int cnt = tdx_min_split_cache_sz(kvm, level);
> 
> 	while (READ_ONCE(prealloc->cnt) < cnt) {
> 		struct page *page = alloc_page(GFP_KERNEL);  <==== GFP_KERNEL_ACCOUNT
> 
> 		if (!page)
> 			return -ENOMEM;
> 
> 		spin_lock(&kvm_tdx->prealloc_split_cache_lock);
> 		list_add(&page->lru, &prealloc->page_list);
> 		prealloc->cnt++;
> 		spin_unlock(&kvm_tdx->prealloc_split_cache_lock);
> 	}
> 
> 	return 0;
>   }