lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXuVR0kq_K1TYwlR@char.us.oracle.com>
Date: Thu, 29 Jan 2026 12:13:43 -0500
From: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        Kiryl Shutsemau <kas@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>,
        linux-kernel@...r.kernel.org, linux-coco@...ts.linux.dev,
        kvm@...r.kernel.org, Kai Huang <kai.huang@...el.com>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        Yan Zhao <yan.y.zhao@...el.com>,
        Vishal Annapurve <vannapurve@...gle.com>,
        Ackerley Tng <ackerleytng@...gle.com>, Sagi Shahar <sagis@...gle.com>,
        Binbin Wu <binbin.wu@...ux.intel.com>,
        Xiaoyao Li <xiaoyao.li@...el.com>,
        Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [RFC PATCH v5 00/45] TDX: Dynamic PAMT + S-EPT Hugepage

On Wed, Jan 28, 2026 at 05:14:32PM -0800, Sean Christopherson wrote:
> This is a combined series of Dynamic PAMT (from Rick), and S-EPT hugepage
> support (from Yan).  Except for some last minute tweaks to the DPAMT array
> args stuff, a version of this based on a Google-internal kernel has been
> moderately well tested (thanks Vishal!).  But overall it's still firmly RFC
> as I have deliberately NOT addressed others feedback from v4 of DPAMT and v3

What does PAMT stand for? Is there a design document somewhere?

> of S-EPT hugepage (mostly lack of cycles), and there's at least one patch in
> here that shouldn't be merged as-is (the quick-and-dirty switch from struct
> page to raw pfns).
> 
> My immediate goal is to solidify the designs for DPAMT and S-EPT hugepage.
> Given the substantial design changes I am proposing, posting an end-to-end
> RFC seemed like a much better method than trying to communicate my thoughts
> piecemeal.
> 
> As for landing these series, I think the fastest overall approach would be
> to land patches 1-4 asap (tangentially related cleanups and fixes), agree

Should they be split out as non-RFC then?

> on a design (hopefully), and then hand control back to Rick and Yan to polish
> their respective series for merge.
> 
> I also want to land the VMXON series[*] before DPAMT, because there's a nasty
> wart where KVM wires up a DPAMT-specific hook even if DPAMT is disabled,
> because KVM's ordering needs to set the vendor hooks before tdx_sysinfo is
> ready.  Decoupling VMXON from KVM solves that problem, because it lets the
> TDX subsystem parse sysinfo before TDX is loaded.
> 
> Beyond that dependency, I am comfortable landing both DPAMT and S-EPT hugepage
> support without any other prereqs, i.e. without an in-tree way to light up
> the S-EPT hugepage code due to lack of hugepage support in guest_memfd.

Can there be test-cases? Or simple code posted for QEMU which is the
tool that 99% of kernel engineers use?

> Outside of the guest_memfd arch hook for in-place conversion, S-EPT hugepage
> support doesn't have any direction dependencies/conflicts with guest_memfd
> hugepage or in-place conversion support (which is great, because it means we
> didn't totally botch the design!).  E.g. Vishal's been able to test this code
> precisely because it applies relatively cleanly on an internal branch with a
> whole pile of guest_memfd changes.
> 
> Applies on kvm-x86 next (specifically kvm-x86-next-2026.01.23).
> 
> [*] https://lore.kernel.org/all/20251206011054.494190-1-seanjc@google.com
> 
> P.S. I apologize if I clobbered any of the Author attribution or SoBs.  I
>      was moving patches around and synchronizing between an internal tree
>      and this upstream version, so things may have gotten a bit wonky.
> 
> Isaku Yamahata (1):
>   KVM: x86/tdp_mmu: Alloc external_spt page for mirror page table
>     splitting
> 
> Kiryl Shutsemau (12):
>   x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h>
>   x86/tdx: Add helpers to check return status codes
>   x86/virt/tdx: Allocate page bitmap for Dynamic PAMT
>   x86/virt/tdx: Allocate reference counters for PAMT memory
>   x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory
>   x86/virt/tdx: Add tdx_alloc/free_control_page() helpers
>   x86/virt/tdx: Optimize tdx_alloc/free_control_page() helpers
>   KVM: TDX: Allocate PAMT memory for TD and vCPU control structures
>   KVM: TDX: Get/put PAMT pages when (un)mapping private memory
>   x86/virt/tdx: Enable Dynamic PAMT
>   Documentation/x86: Add documentation for TDX's Dynamic PAMT
>   x86/virt/tdx: Get/Put DPAMT page pair if and only if mapping size is
>     4KB
> 
> Rick Edgecombe (3):
>   x86/virt/tdx: Simplify tdmr_get_pamt_sz()
>   x86/tdx: Add APIs to support get/put of DPAMT entries from KVM, under
>     spinlock
>   KVM: x86/mmu: Prevent hugepage promotion for mirror roots in fault
>     path
> 
> Sean Christopherson (22):
>   x86/tdx: Use pg_level in TDX APIs, not the TDX-Module's 0-based level
>   KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE
>     "fails"
>   KVM: TDX: Account all non-transient page allocations for per-TD
>     structures
>   KVM: x86: Make "external SPTE" ops that can fail RET0 static calls
>   KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use
>     .set_external_spte() for all
>   KVM: x86/mmu: Fold set_external_spte_present() into its sole caller
>   KVM: x86/mmu: Plumb the SPTE _pointer_ into the TDP MMU's
>     handle_changed_spte()
>   KVM: x86/mmu: Propagate mirror SPTE removal to S-EPT in
>     handle_changed_spte()
>   KVM: x86: Rework .free_external_spt() into .reclaim_external_sp()
>   KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page
>     allocator
>   KVM: x86/mmu: Allocate/free S-EPT pages using
>     tdx_{alloc,free}_control_page()
>   *** DO NOT MERGE *** x86/virt/tdx: Don't assume guest memory is backed
>     by struct page
>   x86/virt/tdx: Extend "reset page" quirk to support huge pages
>   KVM: x86/mmu: Plumb the old_spte into kvm_x86_ops.set_external_spte()
>   KVM: TDX: Hoist tdx_sept_remove_private_spte() above
>     set_private_spte()
>   KVM: TDX: Handle removal of leaf SPTEs in .set_private_spte()
>   KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT
>   KVM: TDX: Move S-EPT page demotion TODO to tdx_sept_set_private_spte()
>   KVM: x86/mmu: Add Dynamic PAMT support in TDP MMU for vCPU-induced
>     page split
>   KVM: guest_memfd: Add helpers to get start/end gfns give
>     gmem+slot+pgoff
>   *** DO NOT MERGE *** KVM: guest_memfd: Add pre-zap arch hook for
>     shared<=>private conversion
>   KVM: x86/mmu: Add support for splitting S-EPT hugepages on conversion
> 
> Xiaoyao Li (1):
>   x86/virt/tdx: Add API to demote a 2MB mapping to 512 4KB mappings
> 
> Yan Zhao (6):
>   x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages
>   x86/virt/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate huge
>     pages
>   KVM: TDX: Add core support for splitting/demoting 2MiB S-EPT to 4KiB
>   KVM: x86: Introduce hugepage_set_guest_inhibit()
>   KVM: TDX: Honor the guest's accept level contained in an EPT violation
>   KVM: TDX: Turn on PG_LEVEL_2M
> 
>  Documentation/arch/x86/tdx.rst              |  21 +
>  arch/x86/coco/tdx/tdx.c                     |  10 +-
>  arch/x86/include/asm/kvm-x86-ops.h          |   9 +-
>  arch/x86/include/asm/kvm_host.h             |  36 +-
>  arch/x86/include/asm/shared/tdx.h           |   1 +
>  arch/x86/include/asm/shared/tdx_errno.h     | 104 +++
>  arch/x86/include/asm/tdx.h                  | 127 ++--
>  arch/x86/include/asm/tdx_global_metadata.h  |   1 +
>  arch/x86/kvm/Kconfig                        |   1 +
>  arch/x86/kvm/mmu.h                          |   4 +
>  arch/x86/kvm/mmu/mmu.c                      |  34 +-
>  arch/x86/kvm/mmu/mmu_internal.h             |  11 -
>  arch/x86/kvm/mmu/tdp_mmu.c                  | 315 ++++----
>  arch/x86/kvm/mmu/tdp_mmu.h                  |   2 +
>  arch/x86/kvm/vmx/tdx.c                      | 468 +++++++++---
>  arch/x86/kvm/vmx/tdx.h                      |   5 +-
>  arch/x86/kvm/vmx/tdx_arch.h                 |   3 +
>  arch/x86/kvm/vmx/tdx_errno.h                |  40 -
>  arch/x86/virt/vmx/tdx/tdx.c                 | 762 +++++++++++++++++---
>  arch/x86/virt/vmx/tdx/tdx.h                 |   6 +-
>  arch/x86/virt/vmx/tdx/tdx_global_metadata.c |   7 +
>  include/linux/kvm_host.h                    |   5 +
>  include/linux/kvm_types.h                   |   2 +
>  virt/kvm/Kconfig                            |   4 +
>  virt/kvm/guest_memfd.c                      |  71 +-
>  virt/kvm/kvm_main.c                         |   7 +-
>  26 files changed, 1576 insertions(+), 480 deletions(-)
>  create mode 100644 arch/x86/include/asm/shared/tdx_errno.h
>  delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h
> 
> 
> base-commit: e81f7c908e1664233974b9f20beead78cde6343a
> -- 
> 2.53.0.rc1.217.geba53bf80e-goog
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ