linux-kernel - Re: [RFC PATCH v5 22/45] KVM: TDX: Get/put PAMT pages when (un)mapping private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <94f041b3aa32169fa2e1125edab7bd8fed3a6e59.camel@intel.com>
Date: Mon, 9 Feb 2026 10:33:11 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>, "Edgecombe, Rick P"
	<rick.p.edgecombe@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "Li, Xiaoyao" <xiaoyao.li@...el.com>, "Zhao,
 Yan Y" <yan.y.zhao@...el.com>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "kas@...nel.org" <kas@...nel.org>,
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "mingo@...hat.com"
	<mingo@...hat.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"ackerleytng@...gle.com" <ackerleytng@...gle.com>, "tglx@...nel.org"
	<tglx@...nel.org>, "Annapurve, Vishal" <vannapurve@...gle.com>,
	"sagis@...gle.com" <sagis@...gle.com>, "bp@...en8.de" <bp@...en8.de>,
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH v5 22/45] KVM: TDX: Get/put PAMT pages when
 (un)mapping private memory

On Fri, 2026-02-06 at 15:18 -0800, Sean Christopherson wrote:
> On Fri, Feb 06, 2026, Rick P Edgecombe wrote:
> > On Fri, 2026-02-06 at 08:03 -0800, Sean Christopherson wrote:
> > > > If this external cache is for PAMT pages allocation for guest pages only,
> > > > here
> > > > the min count should be 1 instead of PT64_ROOT_MAX_LEVEL?
> > > 
> > > Oh!  Right.  Hmm, with that in mind, it seems like topup_external_cache()
> > > isn't
> > > quite the right interace.  It's not at all clear that, unlike the other
> > > caches,
> > > the DPAMT cache isn't tied to the page tables, it's tied to the physical
> > > memory
> > > being mapped into the guest.
> > > 
> > > At the very least, it seems like we should drop the @min parameter?
> > > 
> > > 	int (*topup_external_cache)(struct kvm *kvm, struct kvm_vcpu *vcpu);
> > > 
> > > Though if someone has a name that better captures what the cache is used for,
> > > without bleeding too many details into common x86...
> > 
> > From the TDX perspective we have 4 types of pages that are needed to service
> > faults:
> > 1. "Control pages" (i.e. external page tables themselves)
> > 2. Private guest memory pages
> > 3. DPAMT backing pages for control pages
> > 4. DPAMT backing pages for private pages
> > 
> > (3) is totally hidden now, but we need a hook to allocate (4). But from core
> > MMU's perspective we hide the existence of DPAMT backing pages. So we don't want
> > to leak that concept.
> 
> Heh, there is no way around that.  Common KVM needs to know that the cache is
> tied to mapping a page into the guest, otherwise the parameters don't make any
> sense whatsoever.  All we can do is minimize the bleeding.

Actually, maybe we can even get rid of the DPAMT cache for the actual
private pages w/o introducing new field to 'kvm_mmu_page':

The point is:

  Once we know the PFN and the actual mapping level, we can know whether we
  need DPAMT pages for that PFN.  If we can know outside of MMU lock, then
  we can call tdx_pamt_get(PFN) directly w/o needing the "cache".

In the fault path, we already know the PFN after kvm_mmu_faultin_pfn(),
which is outside of MMU lock.

What we still don't know is the actual mapping level, which is currently
done in kvm_tdp_mmu_map() via kvm_mmu_hugepage_adjust().

However I don't see why we cannot move kvm_mmu_hugepage_adjust() out of it
to, e.g., right after kvm_mmu_faultin_pfn()?

If we can do this, then AFAICT we can just do:

  r = kvm_x86_call(prepare_pfn)(vcpu, fault, pfn);

in which we can just call tdx_pamt_get(pfn) based on the mapping level?

Similar can be done for kvm_tdp_mmu_map_private_pfn() which already takes
the 'pfn' as parameter.

For the split path, we obviously can also know the 'pfn' from the huge SPTE.

I kinda do wish we could get rid of the new 'struct tdx_pamt_cache' pool if
possible.

Anything I missed?