lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251121005125.417831-1-rick.p.edgecombe@intel.com>
Date: Thu, 20 Nov 2025 16:51:09 -0800
From: Rick Edgecombe <rick.p.edgecombe@...el.com>
To: bp@...en8.de,
	chao.gao@...el.com,
	dave.hansen@...el.com,
	isaku.yamahata@...el.com,
	kai.huang@...el.com,
	kas@...nel.org,
	kvm@...r.kernel.org,
	linux-coco@...ts.linux.dev,
	linux-kernel@...r.kernel.org,
	mingo@...hat.com,
	pbonzini@...hat.com,
	seanjc@...gle.com,
	tglx@...utronix.de,
	vannapurve@...gle.com,
	x86@...nel.org,
	yan.y.zhao@...el.com,
	xiaoyao.li@...el.com,
	binbin.wu@...el.com
Cc: rick.p.edgecombe@...el.com
Subject: [PATCH v4 00/16] TDX: Enable Dynamic PAMT

Hi, 

This is 4th revision of Dynamic PAMT, which is a new feature that reduces 
the memory use of TDX. For background information, see the v3 
coverletter[0]. V4 mostly consists of incorporating the feedback from v3. 
Notably what it *doesn’t* change is the solution for pre-allocating DPAMT
backing pages. (more info below in “Changes”)

I think this series isn’t quite ready to ask for merging just yet. I’d 
appreciate another round of review especially looking for any issues in the
refcount allocation/mapping and the pamt get/put lock-refcount dance. And
hopefully collect some RBs.

Sean/Paolo, we have mostly banished this all to TDX code except for “KVM: 
TDX: Add x86 ops for external spt cache” patch. That and “x86/virt/tdx: 
Add helpers to allow for pre-allocating pages” are probably the remaining 
possibly controversial parts of your domain. If you only have a short time 
to spend at this point, I’d point you at those two patches.

Since most of the changes are in arch/x86, I’d think this feature could be
a candidate for eventually merging through tip with Sean or Paolo’s ack.
But it is currently based on kvm-x86/next in order to build on top of the 
post-populate cleanup series. Next time I’ll probably target tip if people 
think that is a good way forward.

Changes
=======
There were two good suggestions around the pre-allocated pages solution
last time, but neither ended up working out:

1. Dave suggested to use mempool_t instead of the linked list based 
structure, in order to not re-invent the wheel. This turned out to not 
quite fit. The problems were that there wasn’t really a “topup” mechanism, 
or an atomic fallback (which matches the kvm cache behavior). This results 
in very similar code being built around mempool that was built around the 
linked list. It was an overall harder to follow solution for not much code 
savings. 

I strongly considered going back to Kiryl’s original solution which passed 
a callback function pointer for allocating DPAMT pages, and an opaque
void * that the callback could use to find the kvm_mmu_memory_cache. I
thought that readability issues of passing the opaque void * between
subsystems outweighed the small code duplication in the simple, familiar
patterned linked list-based code. So I ended up leaving it. 

2. Kai suggested (but later retracted the idea) that since the external 
page table cache was moved to TDX code, it could simply install DPAMT 
pages for the cache at topup time. Then the installation of DPAMT backing 
for S-EPT page tables could be done outside of the mmu_lock. It could also 
be argued that it makes the design simpler in a way, because the external
page table cache acts like it did before. Anything in there could be simply
used. 

At the time my argument against this was that whether a huge page would be 
installed (and thus, whether DPAMT backing was needed) for the guest 
private memory would not be known until later, so early install solution
would need special late handling for TDX huge pages. After some internal
discussions I at looked how we could simplify the series by punting on TDX
huge pages needs. 

But it turns out that this other design was actually more complex and had 
more LOC than the previous solution. So it was dropped, and again, I went 
back to the original solution. 


I’m really starting to think that, while the overall solution here isn’t 
the most elegant, we might not have much more to squeeze from it. So 
design-wise, I think we should think about calling it done. 

Testing
=======
Based on kvm-x86/next (4531ff85d925). Testing was the usual, except I also 
tested with TDX modules that don't support DPAMT, and with the two 
optimization patches removed: “Improve PAMT refcounters allocation for 
sparse memory” and “x86/virt/tdx: Optimize tdx_alloc/free_page() helpers”.

[0] https://lore.kernel.org/kvm/20250918232224.2202592-1-rick.p.edgecombe@intel.com/ 


Kirill A. Shutemov (13):
  x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h>
  x86/tdx: Add helpers to check return status codes
  x86/virt/tdx: Allocate page bitmap for Dynamic PAMT
  x86/virt/tdx: Allocate reference counters for PAMT memory
  x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory
  x86/virt/tdx: Add tdx_alloc/free_page() helpers
  x86/virt/tdx: Optimize tdx_alloc/free_page() helpers
  KVM: TDX: Allocate PAMT memory for TD control structures
  KVM: TDX: Allocate PAMT memory for vCPU control structures
  KVM: TDX: Handle PAMT allocation in fault path
  KVM: TDX: Reclaim PAMT memory
  x86/virt/tdx: Enable Dynamic PAMT
  Documentation/x86: Add documentation for TDX's Dynamic PAMT

Rick Edgecombe (3):
  x86/virt/tdx: Simplify tdmr_get_pamt_sz()
  KVM: TDX: Add x86 ops for external spt cache
  x86/virt/tdx: Add helpers to allow for pre-allocating pages

 Documentation/arch/x86/tdx.rst              |  21 +
 arch/x86/coco/tdx/tdx.c                     |  10 +-
 arch/x86/include/asm/kvm-x86-ops.h          |   3 +
 arch/x86/include/asm/kvm_host.h             |  14 +-
 arch/x86/include/asm/shared/tdx.h           |   8 +
 arch/x86/include/asm/shared/tdx_errno.h     | 104 ++++
 arch/x86/include/asm/tdx.h                  |  78 ++-
 arch/x86/include/asm/tdx_global_metadata.h  |   1 +
 arch/x86/kvm/mmu/mmu.c                      |   6 +-
 arch/x86/kvm/mmu/mmu_internal.h             |   2 +-
 arch/x86/kvm/vmx/tdx.c                      | 160 ++++--
 arch/x86/kvm/vmx/tdx.h                      |   3 +-
 arch/x86/kvm/vmx/tdx_errno.h                |  40 --
 arch/x86/virt/vmx/tdx/tdx.c                 | 587 +++++++++++++++++---
 arch/x86/virt/vmx/tdx/tdx.h                 |   5 +-
 arch/x86/virt/vmx/tdx/tdx_global_metadata.c |   7 +
 16 files changed, 854 insertions(+), 195 deletions(-)
 create mode 100644 arch/x86/include/asm/shared/tdx_errno.h
 delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h

-- 
2.51.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ