[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dde56556-7611-4adf-9015-5bdf1a016786@suse.com>
Date: Thu, 27 Nov 2025 18:11:32 +0200
From: Nikolay Borisov <nik.borisov@...e.com>
To: Rick Edgecombe <rick.p.edgecombe@...el.com>, bp@...en8.de,
chao.gao@...el.com, dave.hansen@...el.com, isaku.yamahata@...el.com,
kai.huang@...el.com, kas@...nel.org, kvm@...r.kernel.org,
linux-coco@...ts.linux.dev, linux-kernel@...r.kernel.org, mingo@...hat.com,
pbonzini@...hat.com, seanjc@...gle.com, tglx@...utronix.de,
vannapurve@...gle.com, x86@...nel.org, yan.y.zhao@...el.com,
xiaoyao.li@...el.com, binbin.wu@...el.com
Subject: Re: [PATCH v4 07/16] x86/virt/tdx: Add tdx_alloc/free_page() helpers
On 21.11.25 г. 2:51 ч., Rick Edgecombe wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
>
> Add helpers to use when allocating or preparing pages that need DPAMT
> backing. Make them handle races internally for the case of multiple
> callers trying operate on the same 2MB range simultaneously.
>
> While the TDX initialization code in arch/x86 uses pages with 2MB
> alignment, KVM will need to hand 4KB pages for it to use. Under DPAMT,
> these pages will need DPAMT backing 4KB backing.
>
> Add tdx_alloc_page() and tdx_free_page() to handle both page allocation
> and DPAMT installation. Make them behave like normal alloc/free functions
> where allocation can fail in the case of no memory, but free (with any
> necessary DPAMT release) always succeeds. Do this so they can support the
> existing TDX flows that require cleanups to succeed. Also create
> tdx_pamt_put()/tdx_pamt_get() to handle installing DPAMT 4KB backing for
> pages that are already allocated (such as external page tables, or S-EPT
> pages).
>
> Allocate the pages as GFP_KERNEL_ACCOUNT based on that the allocations
> will be easily user triggerable.
>
> Since the source of these pages is the page allocator, multiple TDs could
> each get 4KB pages that are covered by the same 2MB range. When this
> happens only one page pair needs to be installed to cover the 2MB range.
> Similarly, when one page is freed, the DPAMT backing cannot be freed until
> all TDX pages in the range are no longer in use. Have the helpers manage
> these races internally.
>
> So the requirements are that:
>
> 1. Free path cannot fail (i.e. no TDX module BUSY errors).
> 2. Allocation paths need to handle finding that DPAMT backing is already
> installed, and only return an error in the case of no memory, not in the
> case of losing races with other’s trying to operate on the same DPAMT
> range.
> 3. Free paths cannot fail, and also need to clean up the DPAMT backing
> when the last page in the 2MB range is no longer needed by TDX.
>
> Previous changes allocated refcounts to be used to track how many 4KB
> pages are in use by TDX for each 2MB region. So update those inside the
> helpers and use them to decide when to actually install the DPAMT backing
> pages.
>
> tdx_pamt_put() needs to guarantee the DPAMT is installed before returning
> so that racing threads don’t tell the TDX module to operate on the page
> before it’s installed. Take a lock while adjusting the refcount and doing
> the actual TDH.PHYMEM.PAMT.ADD/REMOVE to make sure these happen
> atomically. The lock is heavyweight, but will be optimized in future
> changes. Just do the simple solution before any complex improvements.
>
> TDH.PHYMEM.PAMT.ADD/REMOVE take exclusive locks at the granularity each
> 2MB range. A simultaneous attempt to operate on the same 2MB region would
> result in a BUSY error code returned from the SEAMCALL. Since the
> invocation of SEAMCALLs are behind a lock, this won’t conflict.
>
> Besides the contention between TDH.PHYMEM.PAMT.ADD/REMOVE, many other
> SEAMCALLs take the same 2MB granularity locks as shared. This means any
> attempt to operate on the page by the TDX module while simultaneously
> doing PAMT.ADD/REMOVE will result in a BUSY error. This should not happen,
> as the PAMT pages always has to be installed before giving the pages to
> the TDX module anyway.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> [Add feedback, update log]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> ---
> v4:
> - Update tdx_find_pamt_refcount() calls to pass PFN and rely on
> internal PMD bucket calculations. Based on changes in previous patch.
> - Pull calculation TDX DPAMT 2MB range arg into helper.
> - Fix alloc_pamt_array() doesn't zero array on allocation failure (Yan)
> - Move "prealloc" comment to future patch. (Kai)
> - Use union for dpamt page array. (Dave)
> - Use sizeof(*args_array) everywhere instead of sizeof(u64) in some
> places. (Dave)
> - Fix refcount inc/dec cases. (Xiaoyao)
> - Rearrange error handling in tdx_pamt_get()/tdx_pamt_put() to remove
> some indented lines.
> - Make alloc_pamt_array() use GFP_KERNEL_ACCOUNT like the pre-fault
> path does later.
>
> v3:
> - Fix hard to follow iteration over struct members.
> - Simplify the code by removing the intermediate lists of pages.
> - Clear PAMT pages before freeing. (Adrian)
> - Rename tdx_nr_pamt_pages(). (Dave)
> - Add comments some comments, but thought the simpler code needed
> less. So not as much as seem to be requested. (Dave)
> - Fix asymmetry in which level of the add/remove helpers global lock is
> held.
> - Split out optimization.
> - Write log.
> - Flatten call hierarchies and adjust errors accordingly.
> ---
> arch/x86/include/asm/shared/tdx.h | 7 +
> arch/x86/include/asm/tdx.h | 8 +-
> arch/x86/virt/vmx/tdx/tdx.c | 258 ++++++++++++++++++++++++++++++
> arch/x86/virt/vmx/tdx/tdx.h | 2 +
> 4 files changed, 274 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
> index 6a1646fc2b2f..cc2f251cb791 100644
> --- a/arch/x86/include/asm/shared/tdx.h
> +++ b/arch/x86/include/asm/shared/tdx.h
> @@ -145,6 +145,13 @@ struct tdx_module_args {
> u64 rsi;
> };
>
> +struct tdx_module_array_args {
> + union {
> + struct tdx_module_args args;
> + u64 args_array[sizeof(struct tdx_module_args) / sizeof(u64)];
> + };
> +};
> +
> /* Used to communicate with the TDX module */
> u64 __tdcall(u64 fn, struct tdx_module_args *args);
> u64 __tdcall_ret(u64 fn, struct tdx_module_args *args);
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index cf51ccd16194..914213123d94 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -135,11 +135,17 @@ static inline bool tdx_supports_dynamic_pamt(const struct tdx_sys_info *sysinfo)
> return false; /* To be enabled when kernel is ready */
> }
>
> +void tdx_quirk_reset_page(struct page *page);
> +
> int tdx_guest_keyid_alloc(void);
> u32 tdx_get_nr_guest_keyids(void);
> void tdx_guest_keyid_free(unsigned int keyid);
>
> -void tdx_quirk_reset_page(struct page *page);
> +int tdx_pamt_get(struct page *page);
> +void tdx_pamt_put(struct page *page);
> +
> +struct page *tdx_alloc_page(void);
> +void tdx_free_page(struct page *page);
>
> struct tdx_td {
> /* TD root structure: */
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index edf9182ed86d..745b308785d6 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -2009,6 +2009,264 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page)
> }
> EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid);
>
> +/* Number PAMT pages to be provided to TDX module per 2M region of PA */
> +static int tdx_dpamt_entry_pages(void)
> +{
> + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
> + return 0;
> +
> + return tdx_sysinfo.tdmr.pamt_4k_entry_size * PTRS_PER_PTE / PAGE_SIZE;
> +}
Isn't this guaranteed to return 2 always as per the ABI? Can't the
allocation of the 2 pages be moved closer to where it's used - in
tdh_phymem_pamt_add which will simplify things a bit?
<snip>
Powered by blists - more mailing lists