lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aV4jihx/MHOl0+v6@yzhao56-desk.sh.intel.com>
Date: Wed, 7 Jan 2026 17:12:42 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
	<rick.p.edgecombe@...el.com>, <kas@...nel.org>, <tabba@...gle.com>,
	<ackerleytng@...gle.com>, <michael.roth@....com>, <david@...nel.org>,
	<vannapurve@...gle.com>, <sagis@...gle.com>, <vbabka@...e.cz>,
	<thomas.lendacky@....com>, <nik.borisov@...e.com>, <pgonda@...gle.com>,
	<fan.du@...el.com>, <jun.miao@...el.com>, <francescolavra.fl@...il.com>,
	<jgross@...e.com>, <ira.weiny@...el.com>, <isaku.yamahata@...el.com>,
	<xiaoyao.li@...el.com>, <kai.huang@...el.com>, <binbin.wu@...ux.intel.com>,
	<chao.p.peng@...el.com>, <chao.gao@...el.com>
Subject: Re: [PATCH v3 01/24] x86/tdx: Enhance tdh_mem_page_aug() to support
 huge pages

Hi Dave

Thanks for the review!

On Tue, Jan 06, 2026 at 01:08:00PM -0800, Dave Hansen wrote:
> On 1/6/26 02:18, Yan Zhao wrote:
> > Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages.
> > 
> > The SEAMCALL TDH_MEM_PAGE_AUG currently supports adding physical memory to
> > the S-EPT up to 2MB in size.
> > 
> > While keeping the "level" parameter in the tdh_mem_page_aug() wrapper to
> > allow callers to specify the physical memory size, introduce the parameters
> > "folio" and "start_idx" to specify the physical memory starting from the
> > page at "start_idx" within the "folio". The specified physical memory must
> > be fully contained within a single folio.
> > 
> > Invoke tdx_clflush_page() for each 4KB segment of the physical memory being
> > added. tdx_clflush_page() performs CLFLUSH operations conservatively to
> > prevent dirty cache lines from writing back later and corrupting TD memory.
> 
> This changelog is heavy on the "what" and weak on the "why". It's not
> telling me what I need to know.
Indeed. I missed that. I'll keep it in mind. Thanks!

> > +	struct folio *folio = page_folio(page);
> >  	gpa_t gpa = gfn_to_gpa(gfn);
> >  	u64 entry, level_state;
> >  	u64 err;
> >  
> > -	err = tdh_mem_page_aug(&kvm_tdx->td, gpa, tdx_level, page, &entry, &level_state);
> > -
> > +	err = tdh_mem_page_aug(&kvm_tdx->td, gpa, tdx_level, folio,
> > +			       folio_page_idx(folio, page), &entry, &level_state);
> >  	if (unlikely(IS_TDX_OPERAND_BUSY(err)))
> >  		return -EBUSY;
> 
> For example, 'folio' is able to be trivially derived from page. Yet,
> this removes the 'page' argument and replaces it with 'folio' _and_
> another value which can be derived from 'page'.
> 
> This looks superficially like an illogical change. *Why* was this done?
Sorry for missing the "why".

I think we can alternatively derive "folio" and "start_idx" from "page" inside
the wrapper tdh_mem_page_aug() for huge pages.

However, my understanding is that it's better for functions expecting huge pages
to explicitly receive "folio" instead of "page". This way, people can tell from
a function's declaration what the function expects. Is this understanding
correct?

Passing "start_idx" along with "folio" is due to the requirement of mapping only
a sub-range of a huge folio. e.g., we allow creating a 2MB mapping starting from
the nth idx of a 1GB folio.

On the other hand, if we instead pass "page" to tdh_mem_page_aug() for huge
pages and have tdh_mem_page_aug() internally convert it to "folio" and
"start_idx", it makes me wonder if we could have previously just passed "pfn" to
tdh_mem_page_aug() and had tdh_mem_page_aug() convert it to "page".

> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index b0b33f606c11..41ce18619ffc 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -1743,16 +1743,23 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page)
> >  }
> >  EXPORT_SYMBOL_GPL(tdh_vp_addcx);
> >  
> > -u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *page, u64 *ext_err1, u64 *ext_err2)
> > +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct folio *folio,
> > +		     unsigned long start_idx, u64 *ext_err1, u64 *ext_err2)
> >  {
> >  	struct tdx_module_args args = {
> >  		.rcx = gpa | level,
> >  		.rdx = tdx_tdr_pa(td),
> > -		.r8 = page_to_phys(page),
> > +		.r8 = page_to_phys(folio_page(folio, start_idx)),
> >  	};
> > +	unsigned long npages = 1 << (level * PTE_SHIFT);
> >  	u64 ret;
> 
> This 'npages' calculation is not obviously correct. It's not clear what
> "level" is or what values it should have.
> 
> This is precisely the kind of place to deploy a helper that explains
> what is going on.
Will do. Thanks for pointing it out!

> > -	tdx_clflush_page(page);
> > +	if (start_idx + npages > folio_nr_pages(folio))
> > +		return TDX_OPERAND_INVALID;
> 
> Why is this necessary? Would it be a bug if this happens?
This sanity check is due to the requirement in KVM that mapping size should be
no larger than the backend folio size, which ensures the mapping pages are
physically contiguous with homogeneous page attributes. (See the discussion
about "EPT mapping size and folio size" in thread [1]).

Failure of the sanity check could only be due to bugs in the caller (KVM). I
didn't convert the sanity check to an assertion because there's already a
TDX_BUG_ON_2() on error following the invocation of tdh_mem_page_aug() in KVM.

Also, there's no alignment checking because SEAMCALL TDH_MEM_PAGE_AUG() would
fail with a misaligned base PFN.

[1] https://lore.kernel.org/all/aV2A39fXgzuM4Toa@google.com/

> > +	for (int i = 0; i < npages; i++)
> > +		tdx_clflush_page(folio_page(folio, start_idx + i));
> 
> All of the page<->folio conversions are kinda hurting my brain. I think
> we need to decide what the canonical type for these things is in TDX, do
> the conversion once, and stick with it.
Got it!

Since passing in base "page" or base "pfn" may still require the
wrappers/helpers to internally convert them to "folio" for sanity checks, could
we decide that "folio" and "start_idx" are the canonical params for functions
expecting huge pages? Or do you prefer KVM to do the sanity check by itself?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ