[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXuVR0kq_K1TYwlR@char.us.oracle.com>
Date: Thu, 29 Jan 2026 12:13:43 -0500
From: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
Kiryl Shutsemau <kas@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>,
linux-kernel@...r.kernel.org, linux-coco@...ts.linux.dev,
kvm@...r.kernel.org, Kai Huang <kai.huang@...el.com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
Yan Zhao <yan.y.zhao@...el.com>,
Vishal Annapurve <vannapurve@...gle.com>,
Ackerley Tng <ackerleytng@...gle.com>, Sagi Shahar <sagis@...gle.com>,
Binbin Wu <binbin.wu@...ux.intel.com>,
Xiaoyao Li <xiaoyao.li@...el.com>,
Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [RFC PATCH v5 00/45] TDX: Dynamic PAMT + S-EPT Hugepage
On Wed, Jan 28, 2026 at 05:14:32PM -0800, Sean Christopherson wrote:
> This is a combined series of Dynamic PAMT (from Rick), and S-EPT hugepage
> support (from Yan). Except for some last minute tweaks to the DPAMT array
> args stuff, a version of this based on a Google-internal kernel has been
> moderately well tested (thanks Vishal!). But overall it's still firmly RFC
> as I have deliberately NOT addressed others feedback from v4 of DPAMT and v3
What does PAMT stand for? Is there a design document somewhere?
> of S-EPT hugepage (mostly lack of cycles), and there's at least one patch in
> here that shouldn't be merged as-is (the quick-and-dirty switch from struct
> page to raw pfns).
>
> My immediate goal is to solidify the designs for DPAMT and S-EPT hugepage.
> Given the substantial design changes I am proposing, posting an end-to-end
> RFC seemed like a much better method than trying to communicate my thoughts
> piecemeal.
>
> As for landing these series, I think the fastest overall approach would be
> to land patches 1-4 asap (tangentially related cleanups and fixes), agree
Should they be split out as non-RFC then?
> on a design (hopefully), and then hand control back to Rick and Yan to polish
> their respective series for merge.
>
> I also want to land the VMXON series[*] before DPAMT, because there's a nasty
> wart where KVM wires up a DPAMT-specific hook even if DPAMT is disabled,
> because KVM's ordering needs to set the vendor hooks before tdx_sysinfo is
> ready. Decoupling VMXON from KVM solves that problem, because it lets the
> TDX subsystem parse sysinfo before TDX is loaded.
>
> Beyond that dependency, I am comfortable landing both DPAMT and S-EPT hugepage
> support without any other prereqs, i.e. without an in-tree way to light up
> the S-EPT hugepage code due to lack of hugepage support in guest_memfd.
Can there be test-cases? Or simple code posted for QEMU which is the
tool that 99% of kernel engineers use?
> Outside of the guest_memfd arch hook for in-place conversion, S-EPT hugepage
> support doesn't have any direction dependencies/conflicts with guest_memfd
> hugepage or in-place conversion support (which is great, because it means we
> didn't totally botch the design!). E.g. Vishal's been able to test this code
> precisely because it applies relatively cleanly on an internal branch with a
> whole pile of guest_memfd changes.
>
> Applies on kvm-x86 next (specifically kvm-x86-next-2026.01.23).
>
> [*] https://lore.kernel.org/all/20251206011054.494190-1-seanjc@google.com
>
> P.S. I apologize if I clobbered any of the Author attribution or SoBs. I
> was moving patches around and synchronizing between an internal tree
> and this upstream version, so things may have gotten a bit wonky.
>
> Isaku Yamahata (1):
> KVM: x86/tdp_mmu: Alloc external_spt page for mirror page table
> splitting
>
> Kiryl Shutsemau (12):
> x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h>
> x86/tdx: Add helpers to check return status codes
> x86/virt/tdx: Allocate page bitmap for Dynamic PAMT
> x86/virt/tdx: Allocate reference counters for PAMT memory
> x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory
> x86/virt/tdx: Add tdx_alloc/free_control_page() helpers
> x86/virt/tdx: Optimize tdx_alloc/free_control_page() helpers
> KVM: TDX: Allocate PAMT memory for TD and vCPU control structures
> KVM: TDX: Get/put PAMT pages when (un)mapping private memory
> x86/virt/tdx: Enable Dynamic PAMT
> Documentation/x86: Add documentation for TDX's Dynamic PAMT
> x86/virt/tdx: Get/Put DPAMT page pair if and only if mapping size is
> 4KB
>
> Rick Edgecombe (3):
> x86/virt/tdx: Simplify tdmr_get_pamt_sz()
> x86/tdx: Add APIs to support get/put of DPAMT entries from KVM, under
> spinlock
> KVM: x86/mmu: Prevent hugepage promotion for mirror roots in fault
> path
>
> Sean Christopherson (22):
> x86/tdx: Use pg_level in TDX APIs, not the TDX-Module's 0-based level
> KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE
> "fails"
> KVM: TDX: Account all non-transient page allocations for per-TD
> structures
> KVM: x86: Make "external SPTE" ops that can fail RET0 static calls
> KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use
> .set_external_spte() for all
> KVM: x86/mmu: Fold set_external_spte_present() into its sole caller
> KVM: x86/mmu: Plumb the SPTE _pointer_ into the TDP MMU's
> handle_changed_spte()
> KVM: x86/mmu: Propagate mirror SPTE removal to S-EPT in
> handle_changed_spte()
> KVM: x86: Rework .free_external_spt() into .reclaim_external_sp()
> KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page
> allocator
> KVM: x86/mmu: Allocate/free S-EPT pages using
> tdx_{alloc,free}_control_page()
> *** DO NOT MERGE *** x86/virt/tdx: Don't assume guest memory is backed
> by struct page
> x86/virt/tdx: Extend "reset page" quirk to support huge pages
> KVM: x86/mmu: Plumb the old_spte into kvm_x86_ops.set_external_spte()
> KVM: TDX: Hoist tdx_sept_remove_private_spte() above
> set_private_spte()
> KVM: TDX: Handle removal of leaf SPTEs in .set_private_spte()
> KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT
> KVM: TDX: Move S-EPT page demotion TODO to tdx_sept_set_private_spte()
> KVM: x86/mmu: Add Dynamic PAMT support in TDP MMU for vCPU-induced
> page split
> KVM: guest_memfd: Add helpers to get start/end gfns give
> gmem+slot+pgoff
> *** DO NOT MERGE *** KVM: guest_memfd: Add pre-zap arch hook for
> shared<=>private conversion
> KVM: x86/mmu: Add support for splitting S-EPT hugepages on conversion
>
> Xiaoyao Li (1):
> x86/virt/tdx: Add API to demote a 2MB mapping to 512 4KB mappings
>
> Yan Zhao (6):
> x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages
> x86/virt/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate huge
> pages
> KVM: TDX: Add core support for splitting/demoting 2MiB S-EPT to 4KiB
> KVM: x86: Introduce hugepage_set_guest_inhibit()
> KVM: TDX: Honor the guest's accept level contained in an EPT violation
> KVM: TDX: Turn on PG_LEVEL_2M
>
> Documentation/arch/x86/tdx.rst | 21 +
> arch/x86/coco/tdx/tdx.c | 10 +-
> arch/x86/include/asm/kvm-x86-ops.h | 9 +-
> arch/x86/include/asm/kvm_host.h | 36 +-
> arch/x86/include/asm/shared/tdx.h | 1 +
> arch/x86/include/asm/shared/tdx_errno.h | 104 +++
> arch/x86/include/asm/tdx.h | 127 ++--
> arch/x86/include/asm/tdx_global_metadata.h | 1 +
> arch/x86/kvm/Kconfig | 1 +
> arch/x86/kvm/mmu.h | 4 +
> arch/x86/kvm/mmu/mmu.c | 34 +-
> arch/x86/kvm/mmu/mmu_internal.h | 11 -
> arch/x86/kvm/mmu/tdp_mmu.c | 315 ++++----
> arch/x86/kvm/mmu/tdp_mmu.h | 2 +
> arch/x86/kvm/vmx/tdx.c | 468 +++++++++---
> arch/x86/kvm/vmx/tdx.h | 5 +-
> arch/x86/kvm/vmx/tdx_arch.h | 3 +
> arch/x86/kvm/vmx/tdx_errno.h | 40 -
> arch/x86/virt/vmx/tdx/tdx.c | 762 +++++++++++++++++---
> arch/x86/virt/vmx/tdx/tdx.h | 6 +-
> arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 7 +
> include/linux/kvm_host.h | 5 +
> include/linux/kvm_types.h | 2 +
> virt/kvm/Kconfig | 4 +
> virt/kvm/guest_memfd.c | 71 +-
> virt/kvm/kvm_main.c | 7 +-
> 26 files changed, 1576 insertions(+), 480 deletions(-)
> create mode 100644 arch/x86/include/asm/shared/tdx_errno.h
> delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h
>
>
> base-commit: e81f7c908e1664233974b9f20beead78cde6343a
> --
> 2.53.0.rc1.217.geba53bf80e-goog
>
>
Powered by blists - more mailing lists