linux-kernel - Re: [RFC PATCH v5 22/45] KVM: TDX: Get/put PAMT pages when (un)mapping private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aYZ2qft-akOYwkOk@google.com>
Date: Fri, 6 Feb 2026 15:18:01 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Rick P Edgecombe <rick.p.edgecombe@...el.com>
Cc: Yan Y Zhao <yan.y.zhao@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>, Xiaoyao Li <xiaoyao.li@...el.com>, 
	Kai Huang <kai.huang@...el.com>, 
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "kas@...nel.org" <kas@...nel.org>, 
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "mingo@...hat.com" <mingo@...hat.com>, 
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "ackerleytng@...gle.com" <ackerleytng@...gle.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Isaku Yamahata <isaku.yamahata@...el.com>, 
	"sagis@...gle.com" <sagis@...gle.com>, "tglx@...nel.org" <tglx@...nel.org>, "bp@...en8.de" <bp@...en8.de>, 
	Vishal Annapurve <vannapurve@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH v5 22/45] KVM: TDX: Get/put PAMT pages when
 (un)mapping private memory

On Fri, Feb 06, 2026, Rick P Edgecombe wrote:
> On Fri, 2026-02-06 at 08:03 -0800, Sean Christopherson wrote:
> > > If this external cache is for PAMT pages allocation for guest pages only,
> > > here
> > > the min count should be 1 instead of PT64_ROOT_MAX_LEVEL?
> > 
> > Oh!  Right.  Hmm, with that in mind, it seems like topup_external_cache()
> > isn't
> > quite the right interace.  It's not at all clear that, unlike the other
> > caches,
> > the DPAMT cache isn't tied to the page tables, it's tied to the physical
> > memory
> > being mapped into the guest.
> > 
> > At the very least, it seems like we should drop the @min parameter?
> > 
> > 	int (*topup_external_cache)(struct kvm *kvm, struct kvm_vcpu *vcpu);
> > 
> > Though if someone has a name that better captures what the cache is used for,
> > without bleeding too many details into common x86...
> 
> From the TDX perspective we have 4 types of pages that are needed to service
> faults:
> 1. "Control pages" (i.e. external page tables themselves)
> 2. Private guest memory pages
> 3. DPAMT backing pages for control pages
> 4. DPAMT backing pages for private pages
> 
> (3) is totally hidden now, but we need a hook to allocate (4). But from core
> MMU's perspective we hide the existence of DPAMT backing pages. So we don't want
> to leak that concept.

Heh, there is no way around that.  Common KVM needs to know that the cache is
tied to mapping a page into the guest, otherwise the parameters don't make any
sense whatsoever.  All we can do is minimize the bleeding.

> The page we need is kind of like something to "prepare" the private page before
> installing it. It actually isn't that related to the mirror/external concept. So
> if we separate it from "external" and make it about installing private guest
> memory, it fits better conceptually I think. But it could be a bit confusing for
> other types of VMs who have to trace to see if anything special is happening
> inside the op for their private memory. In that case it could be:
> 
> (*topup_private_mem_prepare_cache)(struct kvm_vcpu *vcpu)

topup + prepare is redundant and confusing.

> The core MMU doesn't know about DPAMT backing pages, but it does know about the
> set_external_spte op that consumes this cache. So how about the slightly
> misleading:
> 
> (*topup_set_external_spte_cache)(struct kvm_vcpu *vcpu)

I really, really, want to avoid "SPTE", because the cache isn't for the SPTE in
any way, it's for the memory that's _pointed_ at by the SPTE.  And the confusion
is exactly what prompted this thread: I forgot that it's not every SPTE in the
chain that needs DPAMT backing, it's only the page that's being mapped into the
guest.

How about?

  (*topup_private_mapping_cache)

Because it's not just "private memory" it's specifically the mapping.  E.g. for
the hugepage split case, the primary memory is already assigned and mapped into
the guest, but a topup is still needed because KVM is creating a new/different
mapping.

> It is easier for other VM types to ignore, and not that semantically wrong from
> what is happening on the TDX side.