linux-kernel - Re: [PATCHv2 00/12] TDX: Enable Dynamic PAMT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGtprH9x8vATTX612ZUf-wJmAbn+=LUTP-SOnkh-GTUHmW3T-w@mail.gmail.com>
Date: Tue, 12 Aug 2025 15:00:43 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
Cc: "seanjc@...gle.com" <seanjc@...gle.com>, "Gao, Chao" <chao.gao@...el.com>, 
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "Huang, Kai" <kai.huang@...el.com>, 
	"kas@...nel.org" <kas@...nel.org>, "bp@...en8.de" <bp@...en8.de>, "mingo@...hat.com" <mingo@...hat.com>, 
	"Zhao, Yan Y" <yan.y.zhao@...el.com>, "x86@...nel.org" <x86@...nel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "pbonzini@...hat.com" <pbonzini@...hat.com>, 
	"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>, 
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "tglx@...utronix.de" <tglx@...utronix.de>, 
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCHv2 00/12] TDX: Enable Dynamic PAMT

> On Tue, Aug 12, 2025 at 11:39 AM Edgecombe, Rick P <rick.p.edgecombe@...el.com> wrote:
>
> On Tue, 2025-08-12 at 09:15 -0700, Sean Christopherson wrote:
> > > I actually went down this path too, but the problem I hit was that TDX
> > > module wants the PAMT page size to match the S-EPT page size.
> >
> > Right, but over-populating the PAMT would just result in "wasted" memory,
> > correct? I.e. KVM can always provide more PAMT entries than are needed.  Or am
> > I misunderstanding how dynamic PAMT works?
>
> Demote needs DPAMT pages in order to split the DPAMT. But "needs" is what I was
> hoping to understand better.
>
> I do think though, that we should consider premature optimization vs re-
> architecting DPAMT only for the sake of a short term KVM design. As in, if fault
> path managed DPAMT is better for the whole lazy accept way of things, it
> probably makes more sense to just do it upfront with the existing architecture.
>
> BTW, I think I untangled the fault path DPAMT page allocation code in this
> series. I basically moved the existing external page cache allocation to
> kvm/vmx/tdx.c. So the details of the top up and external page table cache
> happens outside of x86 mmu code. The top up structure comes from arch/x86 side
> of tdx code, so the cache can just be passed into tdx_pamt_get(). And from the
> MMU code's perspective there is just one type "external page tables". It doesn't
> know about DPAMT at all.
>
> So if that ends up acceptable, I think the main problem left is just this global
> lock. And it seems we have a simple solution for it if needed.
>
> >
> > In other words, IMO, reclaiming PAMT pages on-demand is also a premature
> > optimization of sorts, as it's not obvious to me that the host would actually
> > be able to take advantage of the unused memory.
>
> I was imagining some guestmemfd callback to setup DPAMT backing for all the
> private memory. Just leave it when it's shared for simplicity. Then cleanup
> DPAMT when the pages are freed from guestmemfd. The control pages could have
> their own path like it does in this series. But it doesn't seem supported.

IMO, tieing lifetime of guest_memfd folios with that of KVM ownership
beyond the memslot lifetime is leaking more state into guest_memfd
than needed. e.g. This will prevent usecases where guest_memfd needs
to be reused while handling reboot of a confidential VM [1].

IMO, if avoidable, its better to not have DPAMT or generally other KVM
arch specific state tracking hooked up to guest memfd folios specially
with hugepage support and whole folio splitting/merging that needs to
be done. If you still need it, guest_memfd should be stateless as much
as possible just like we are pushing for SNP preparation tracking [2]
to happen within KVM SNP and IMO any such tracking should ideally be
cleaned up on memslot unbinding.

[1] https://lore.kernel.org/kvm/CAGtprH9NbCPSwZrQAUzFw=4rZPA60QBM2G8opYo9CZxRiYihzg@mail.gmail.com/
[2] https://lore.kernel.org/kvm/20250613005400.3694904-2-michael.roth@amd.com/