lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <404c3dddef3025537d2942386ab0ea0f72ab9dc3.camel@intel.com>
Date: Wed, 1 Oct 2025 19:00:51 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "kas@...nel.org"
	<kas@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>, "mingo@...hat.com"
	<mingo@...hat.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "kirill.shutemov@...ux.intel.com"
	<kirill.shutemov@...ux.intel.com>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "Gao, Chao" <chao.gao@...el.com>, "bp@...en8.de"
	<bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v3 06/16] x86/virt/tdx: Improve PAMT refcounters
 allocation for sparse memory

On Wed, 2025-10-01 at 10:40 +0000, Huang, Kai wrote:
> On Wed, 2025-10-01 at 00:32 +0000, Edgecombe, Rick P wrote:
> > 
> > Part of the problem is that tdx_find_pamt_refcount() expects the hpa passed in
> > to be PMD aligned. The other callers of tdx_find_pamt_refcount() also make sure
> > that the PA passed in is 2MB aligned before calling, and compute this starting
> > with a PFN. So to try to make it easier to read and be correct what do you think
> > about the below:
> > 
> > static atomic_t *tdx_find_pamt_refcount(unsigned long pfn) {
> >     unsigned long hpa = ALIGN_DOWN(pfn, PMD_SIZE);
> 
> Shouldn't it be:
> 
> 	hpa = ALIGN_DOWN(PFN_PHYS(pfn), PMD_SIZE));
> ?

Err, yes.

> > 
> >     return &pamt_refcounts[hpa / PMD_SIZE];
> > }
> > 
> > /*
> >  * 'start_pfn' is inclusive and 'end_pfn' is exclusive. 
> > 
> 
> I think 'end_pfn' is exclusive is a little bit confusing?
> 

It's trying to say that end_pfn is an exclusive range value, so the range in
question does not actually include end_pfn. That is how the calling code looked
to me, but please check.

Maybe I can clarify that start_pfn and end_pfn are a range? It seems redundant.

>   It sounds like
> the physical range from PFN_PHYS(end_pfn - 1) to PFN_PHYS(end_pfn) is also
> exclusive, but it is actually not?
> 

Not sure what you mean. For clarity, when describing a range sometimes the start
or end include that value, and sometimes they don't. So exclusive here means
that the end_pfn is not included in the range. As in we don't need a refcount
allocation for end_pfn.

>   To me it's more like only the physical
> address PFN_PHYS(end_pfn) is exclusive.
> 
> > Compute the 
> >  * page range to be inclusive of the start and end refcount
> >  * addresses and at least a page in size. The teardown logic needs
> >  * to handle potentially overlapping refcounts mappings resulting
> >  * from this.
> >  */
> > start = (unsigned long)tdx_find_pamt_refcount(start_pfn);
> > end   = (unsigned long)tdx_find_pamt_refcount(end_pfn - 1);
> > start = ALIGN_DOWN(start, PAGE_SIZE);
> > end   = ALIGN_DOWN(end, PAGE_SIZE) + PAGE_SIZE;
> 
> This looks fine to me.  I mean the result should be correct, but the
> 'end_pfn - 1' (due to 'end_pfn' is exclusive) is a bit confusing to me as
> said above, but maybe it's only me, so feel free to ignore.

Sorry, I'm not following the confusion. Maybe we can have a quick chat when you
come online.

> 
> Or, as said above, I think the problem of the "Above fix" is when
> calculating the @end we didn't consider the case where it equals to @start
> and is already page aligned.  Does below work (assuming
> tdx_find_pamt_refcount() still takes 'hpa')?
> 
>     start = (unsigned long)tdx_find_pamt_refcount(PFN_PHYS(start_pfn));
>     end   = (unsigned long)tdx_find_pamt_refcount(PFN_PHYS(end_pfn) - 1));
>     start = PAGE_ALIGN_DOWN(start);
>     end   = PAGE_ALIGN_DOWN(end) + PAGE_SIZE;

The refcounts are actually per page/pfn not per PA. So I think introducing the
concept of a refcount for a PA, especially as part of the interfaces is adding
confusion. The math happens to work out, but it's a layer of indirection.

The other problem this collides with is that tdx_find_pamt_refcount() callers
all have to convert PFN to PA. This should be fixed.


I guess we are really doing two separate calculations. First calculate the range
of refcounts we need, and then calculate the range of vmalloc space we need. How
about this, is it clearer to you? It is very specific about what/why we actually
are doing, but at the expense of minimizing operations. In this slow path, I
think clarity is the priority.

static int alloc_pamt_refcount(unsigned long start_pfn, unsigned long end_pfn)
{
	unsigned long refcount_first, refcount_last;
	unsigned long mapping_start, mapping_end;

	/*
	 * 'start_pfn' is inclusive and 'end_pfn' is exclusive. Find the
	 * range of refcounts the pfn range will need.
	 */
	refcount_first = (unsigned long)tdx_find_pamt_refcount(start_pfn);
	refcount_last   = (unsigned long)tdx_find_pamt_refcount(end_pfn - 1);

	/*
	 * Calculate the page aligned range that includes the refcounts. The
	 * teardown logic needs to handle potentially overlapping refcount
	 * mappings resulting from the alignments.
	 */
	mapping_start = round_down(refcount_first, PAGE_SIZE);
	mapping_end   = round_up(refcount_last + sizeof(*pamt_refcounts),
PAGE_SIZE);


	return apply_to_page_range(&init_mm, mapping_start, mapping_end -
mapping_start,
				   pamt_refcount_populate, NULL);
}

> 
> Anyway, don't have strong opinion here, so up to you.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ