linux-kernel - Re: [PATCH v4 07/16] x86/virt/tdx: Add tdx_alloc/free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7550dbfc61009ef5f6c81ea6681c38e64a265a2.camel@intel.com>
Date: Mon, 1 Dec 2025 22:31:29 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "Huang, Kai" <kai.huang@...el.com>, "Li,
 Xiaoyao" <xiaoyao.li@...el.com>, "Hansen, Dave" <dave.hansen@...el.com>,
	"Zhao, Yan Y" <yan.y.zhao@...el.com>, "Wu, Binbin" <binbin.wu@...el.com>,
	"kas@...nel.org" <kas@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"mingo@...hat.com" <mingo@...hat.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "nik.borisov@...e.com"
	<nik.borisov@...e.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"Annapurve, Vishal" <vannapurve@...gle.com>, "Gao, Chao"
	<chao.gao@...el.com>, "bp@...en8.de" <bp@...en8.de>, "x86@...nel.org"
	<x86@...nel.org>
CC: "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH v4 07/16] x86/virt/tdx: Add tdx_alloc/free_page() helpers

On Thu, 2025-11-27 at 14:29 +0200, Nikolay Borisov wrote:
> > While the TDX initialization code in arch/x86 uses pages with 2MB
> > alignment, KVM will need to hand 4KB pages for it to use. Under DPAMT,
> > these pages will need DPAMT backing 4KB backing.
> 
> That paragraph is rather hard to parse. KVM will need to hand 4k pages 
> to whom? The tdx init code? Also the last sentence with the 2 "backing" 
> words is hard to parse. Does it say that the 4k pages that KVM need to 
> pass must be backed by DPAMT pages i.e like a chicken and egg problem?

You're right it's confusing and also a slightly misleading typo, maybe this is
an improvement?

   While the TDX module initialization code in arch/x86 only hands pages to the
   TDX module with 2MB alignment, KVM will need to hand pages to the TDX module
   at 4KB granularity. Under DPAMT, such pages will require performing the
   TDH.PHYMEM.PAMT.ADD SEAMCALL to give a page pair the the TDX module to use
   for PAMT tracking at the 4KB page size page.

Note: There is no chicken or egg problem, TDH.PHYMEM.PAMT.ADD handles it.

> 
> > 
> > Add tdx_alloc_page() and tdx_free_page() to handle both page allocation
> > and DPAMT installation. Make them behave like normal alloc/free functions
> > where allocation can fail in the case of no memory, but free (with any
> > necessary DPAMT release) always succeeds. Do this so they can support the
> > existing TDX flows that require cleanups to succeed. Also create
> > tdx_pamt_put()/tdx_pamt_get() to handle installing DPAMT 4KB backing for
> > pages that are already allocated (such as external page tables, or S-EPT
> > pages).
> > 
> 
> <snip>
> 
> 
> > +
> > +/* Serializes adding/removing PAMT memory */
> > +static DEFINE_SPINLOCK(pamt_lock);
> > +
> > +/* Bump PAMT refcount for the given page and allocate PAMT memory if needed
> > */
> > +int tdx_pamt_get(struct page *page)
> > +{
> > +	u64 pamt_pa_array[MAX_TDX_ARG_SIZE(rdx)];
> > +	atomic_t *pamt_refcount;
> > +	u64 tdx_status;
> > +	int ret;
> > +
> > +	if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
> > +		return 0;
> > +
> > +	ret = alloc_pamt_array(pamt_pa_array);
> > +	if (ret)
> > +		goto out_free;
> > +
> > +	pamt_refcount = tdx_find_pamt_refcount(page_to_pfn(page));
> > +
> > +	scoped_guard(spinlock, &pamt_lock) {
> > +		/*
> > +		 * If the pamt page is already added (i.e. refcount >= 1),
> > +		 * then just increment the refcount.
> > +		 */
> > +		if (atomic_read(pamt_refcount)) {
> > +			atomic_inc(pamt_refcount);
> > +			goto out_free;
> > +		}
> 
> Replace this pair of read/inc with a single call to atomic_inc_not_zero()

From the thread with Binbin on this patch, the atomics aren't really needed
until the optimization patch. So was thinking to actually use a simpler non-
atomic operations.

> 
> > +
> > +		/* Try to add the pamt page and take the refcount 0->1. */
> > +
> > +		tdx_status = tdh_phymem_pamt_add(page, pamt_pa_array);
> > +		if (!IS_TDX_SUCCESS(tdx_status)) {
> > +			pr_err("TDH_PHYMEM_PAMT_ADD failed: %#llx\n",
> > tdx_status);
> > +			goto out_free;
> > +		}
> > +
> > +		atomic_inc(pamt_refcount);
> > +	}
> > +
> > +	return ret;
> > +out_free:
> > +	/*
> > +	 * pamt_pa_array is populated or zeroed up to
> > tdx_dpamt_entry_pages()
> > +	 * above. free_pamt_array() can handle either case.
> > +	 */
> > +	free_pamt_array(pamt_pa_array);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(tdx_pamt_get);
> > +
> > +/*
> > + * Drop PAMT refcount for the given page and free PAMT memory if it is no
> > + * longer needed.
> > + */
> > +void tdx_pamt_put(struct page *page)
> > +{
> > +	u64 pamt_pa_array[MAX_TDX_ARG_SIZE(rdx)];
> > +	atomic_t *pamt_refcount;
> > +	u64 tdx_status;
> > +
> > +	if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
> > +		return;
> > +
> > +	pamt_refcount = tdx_find_pamt_refcount(page_to_pfn(page));
> > +
> > +	scoped_guard(spinlock, &pamt_lock) {
> > +		/*
> > +		 * If the there are more than 1 references on the pamt
> > page,
> > +		 * don't remove it yet. Just decrement the refcount.
> > +		 */
> > +		if (atomic_read(pamt_refcount) > 1) {
> > +			atomic_dec(pamt_refcount);
> > +			return;
> > +		}
> 
> nit: Could be replaced with : atomic_add_unless(pamt_refcount, -1, 1);
> 
> Probably it would have been better to simply use atomic64_dec_and_test 
> and if it returns true do the phymem_pamt_remove, but I suspect you 
> can't do it because in case it fails you don't want to decrement the 
> last refcount, though that could be remedied by an extra atomic_int in 
> the failure path. I guess it might be worth simplifying since the extra 
> inc will only be needed in exceptional cases (we don't expect failure ot 
> be the usual path) and freeing is not a fast path.

The goal of this patch is to be as simple and obviously correct as possible.
Then the next patch "x86/virt/tdx: Optimize tdx_alloc/free_page() helpers"
should have the optimized versions. Do you have any similar suggestions on the
code after the next patch is applied?