[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251121005125.417831-1-rick.p.edgecombe@intel.com>
Date: Thu, 20 Nov 2025 16:51:09 -0800
From: Rick Edgecombe <rick.p.edgecombe@...el.com>
To: bp@...en8.de,
chao.gao@...el.com,
dave.hansen@...el.com,
isaku.yamahata@...el.com,
kai.huang@...el.com,
kas@...nel.org,
kvm@...r.kernel.org,
linux-coco@...ts.linux.dev,
linux-kernel@...r.kernel.org,
mingo@...hat.com,
pbonzini@...hat.com,
seanjc@...gle.com,
tglx@...utronix.de,
vannapurve@...gle.com,
x86@...nel.org,
yan.y.zhao@...el.com,
xiaoyao.li@...el.com,
binbin.wu@...el.com
Cc: rick.p.edgecombe@...el.com
Subject: [PATCH v4 00/16] TDX: Enable Dynamic PAMT
Hi,
This is 4th revision of Dynamic PAMT, which is a new feature that reduces
the memory use of TDX. For background information, see the v3
coverletter[0]. V4 mostly consists of incorporating the feedback from v3.
Notably what it *doesn’t* change is the solution for pre-allocating DPAMT
backing pages. (more info below in “Changes”)
I think this series isn’t quite ready to ask for merging just yet. I’d
appreciate another round of review especially looking for any issues in the
refcount allocation/mapping and the pamt get/put lock-refcount dance. And
hopefully collect some RBs.
Sean/Paolo, we have mostly banished this all to TDX code except for “KVM:
TDX: Add x86 ops for external spt cache” patch. That and “x86/virt/tdx:
Add helpers to allow for pre-allocating pages” are probably the remaining
possibly controversial parts of your domain. If you only have a short time
to spend at this point, I’d point you at those two patches.
Since most of the changes are in arch/x86, I’d think this feature could be
a candidate for eventually merging through tip with Sean or Paolo’s ack.
But it is currently based on kvm-x86/next in order to build on top of the
post-populate cleanup series. Next time I’ll probably target tip if people
think that is a good way forward.
Changes
=======
There were two good suggestions around the pre-allocated pages solution
last time, but neither ended up working out:
1. Dave suggested to use mempool_t instead of the linked list based
structure, in order to not re-invent the wheel. This turned out to not
quite fit. The problems were that there wasn’t really a “topup” mechanism,
or an atomic fallback (which matches the kvm cache behavior). This results
in very similar code being built around mempool that was built around the
linked list. It was an overall harder to follow solution for not much code
savings.
I strongly considered going back to Kiryl’s original solution which passed
a callback function pointer for allocating DPAMT pages, and an opaque
void * that the callback could use to find the kvm_mmu_memory_cache. I
thought that readability issues of passing the opaque void * between
subsystems outweighed the small code duplication in the simple, familiar
patterned linked list-based code. So I ended up leaving it.
2. Kai suggested (but later retracted the idea) that since the external
page table cache was moved to TDX code, it could simply install DPAMT
pages for the cache at topup time. Then the installation of DPAMT backing
for S-EPT page tables could be done outside of the mmu_lock. It could also
be argued that it makes the design simpler in a way, because the external
page table cache acts like it did before. Anything in there could be simply
used.
At the time my argument against this was that whether a huge page would be
installed (and thus, whether DPAMT backing was needed) for the guest
private memory would not be known until later, so early install solution
would need special late handling for TDX huge pages. After some internal
discussions I at looked how we could simplify the series by punting on TDX
huge pages needs.
But it turns out that this other design was actually more complex and had
more LOC than the previous solution. So it was dropped, and again, I went
back to the original solution.
I’m really starting to think that, while the overall solution here isn’t
the most elegant, we might not have much more to squeeze from it. So
design-wise, I think we should think about calling it done.
Testing
=======
Based on kvm-x86/next (4531ff85d925). Testing was the usual, except I also
tested with TDX modules that don't support DPAMT, and with the two
optimization patches removed: “Improve PAMT refcounters allocation for
sparse memory” and “x86/virt/tdx: Optimize tdx_alloc/free_page() helpers”.
[0] https://lore.kernel.org/kvm/20250918232224.2202592-1-rick.p.edgecombe@intel.com/
Kirill A. Shutemov (13):
x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h>
x86/tdx: Add helpers to check return status codes
x86/virt/tdx: Allocate page bitmap for Dynamic PAMT
x86/virt/tdx: Allocate reference counters for PAMT memory
x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory
x86/virt/tdx: Add tdx_alloc/free_page() helpers
x86/virt/tdx: Optimize tdx_alloc/free_page() helpers
KVM: TDX: Allocate PAMT memory for TD control structures
KVM: TDX: Allocate PAMT memory for vCPU control structures
KVM: TDX: Handle PAMT allocation in fault path
KVM: TDX: Reclaim PAMT memory
x86/virt/tdx: Enable Dynamic PAMT
Documentation/x86: Add documentation for TDX's Dynamic PAMT
Rick Edgecombe (3):
x86/virt/tdx: Simplify tdmr_get_pamt_sz()
KVM: TDX: Add x86 ops for external spt cache
x86/virt/tdx: Add helpers to allow for pre-allocating pages
Documentation/arch/x86/tdx.rst | 21 +
arch/x86/coco/tdx/tdx.c | 10 +-
arch/x86/include/asm/kvm-x86-ops.h | 3 +
arch/x86/include/asm/kvm_host.h | 14 +-
arch/x86/include/asm/shared/tdx.h | 8 +
arch/x86/include/asm/shared/tdx_errno.h | 104 ++++
arch/x86/include/asm/tdx.h | 78 ++-
arch/x86/include/asm/tdx_global_metadata.h | 1 +
arch/x86/kvm/mmu/mmu.c | 6 +-
arch/x86/kvm/mmu/mmu_internal.h | 2 +-
arch/x86/kvm/vmx/tdx.c | 160 ++++--
arch/x86/kvm/vmx/tdx.h | 3 +-
arch/x86/kvm/vmx/tdx_errno.h | 40 --
arch/x86/virt/vmx/tdx/tdx.c | 587 +++++++++++++++++---
arch/x86/virt/vmx/tdx/tdx.h | 5 +-
arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 7 +
16 files changed, 854 insertions(+), 195 deletions(-)
create mode 100644 arch/x86/include/asm/shared/tdx_errno.h
delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h
--
2.51.2
Powered by blists - more mailing lists