linux-kernel - Re: [PATCH v4 11/16] KVM: TDX: Add x86 ops for external spt cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aXFPNbCvKURxby1q@google.com>
Date: Wed, 21 Jan 2026 14:12:05 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Rick P Edgecombe <rick.p.edgecombe@...el.com>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>, Kai Huang <kai.huang@...el.com>, 
	Xiaoyao Li <xiaoyao.li@...el.com>, Dave Hansen <dave.hansen@...el.com>, 
	Yan Y Zhao <yan.y.zhao@...el.com>, Binbin Wu <binbin.wu@...el.com>, 
	"kas@...nel.org" <kas@...nel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "mingo@...hat.com" <mingo@...hat.com>, 
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "tglx@...utronix.de" <tglx@...utronix.de>, 
	Isaku Yamahata <isaku.yamahata@...el.com>, Vishal Annapurve <vannapurve@...gle.com>, 
	Chao Gao <chao.gao@...el.com>, "bp@...en8.de" <bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v4 11/16] KVM: TDX: Add x86 ops for external spt cache

On Tue, Jan 20, 2026, Rick P Edgecombe wrote:
> Sean, really appreciate you taking a look despite being overbooked.
> 
> On Fri, 2026-01-16 at 16:53 -0800, Sean Christopherson wrote:
> > NAK.  I kinda sorta get why you did this?  But the pages KVM uses for page tables
> > are KVM's, not to be mixed with PAMT pages.
> > 
> > Eww.  Definitely a hard "no".  In tdp_mmu_alloc_sp_for_split(), the allocation
> > comes from KVM:
> > 
> > 	if (mirror) {
> > 		sp->external_spt = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> > 		if (!sp->external_spt) {
> > 			free_page((unsigned long)sp->spt);
> > 			kmem_cache_free(mmu_page_header_cache, sp);
> > 			return NULL;
> > 		}
> > 	}
> 
> Ah, this is from the TDX huge pages series. There is a bit of fallout from TDX 
> /coco's eternal nemesis: stacks of code all being co-designed at once.
> 
> Dave has been directing us recently to focus on only the needs of the current
> series. Now that we can test at each incremental step we don't have the same
> problems as before. But of course there is still desire for updated TDX huge
> pages, etc to help with development of all the other WIP stuff.
> 
> For this design aspect of how the topup caches work for DPAMT, he asked
> specifically for the DPAMT patches to *not* consider how TDX huge pages will use
> them.
> 
> Now the TDX huge pages coverletter asked you to look at some aspects of that,
> and traditionally KVM side has preferred to look at how the code is all going to
> work together. The presentation of this was a bit rushed and confused, but
> looking forward, how do you want to do this?
> 
> After the 130 patches ordeal, I'm a bit amenable to Dave's view. What do you
> think?

IMO, it's largely irrelevant for this discussion.  Bluntly, the code proposed
here is simply bad.  S-EPT hugepage support just makes it worse.

The core issue is that the ownership of the pre-allocation cache is split across
KVM and the TDX subsystem (and within KVM, between tdx.c and the MMU), which makes
it extremely difficult to understand who is responsible for what, which in turn
leads to brittle code, and sets the hugepage series up to fail, e.g. by unnecessarily
mixing S-EPT page allocation with PAMT maintenance.q

That aside, I generally agree with Dave.  The only caveat I'll throw in is that
I do think we need to _at least_ consider how things will likely play out when
all is said and done, otherwise we'll probably paint ourselves into a corner.
E.g. we don't need to know exactly how S-EPT hugepage support will interact with
DPAMT, but IMO we do need to be aware that KVM will need to demote pages outside
of vCPU context, and thus will need to pre-allocate pages for PAMT without having
a loaded/running vCPU.  That knowledge doesn't require active support in the
DPAMT series, but it most definitely influences design decisions.