lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aW3MGJxR3IuC/tDV@yzhao56-desk.sh.intel.com>
Date: Mon, 19 Jan 2026 14:15:52 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>
CC: "Du, Fan" <fan.du@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"Li, Xiaoyao" <xiaoyao.li@...el.com>, "Hansen, Dave" <dave.hansen@...el.com>,
	"thomas.lendacky@....com" <thomas.lendacky@....com>, "tabba@...gle.com"
	<tabba@...gle.com>, "vbabka@...e.cz" <vbabka@...e.cz>, "david@...nel.org"
	<david@...nel.org>, "michael.roth@....com" <michael.roth@....com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Weiny, Ira" <ira.weiny@...el.com>,
	"kas@...nel.org" <kas@...nel.org>, "nik.borisov@...e.com"
	<nik.borisov@...e.com>, "ackerleytng@...gle.com" <ackerleytng@...gle.com>,
	"Peng, Chao P" <chao.p.peng@...el.com>, "francescolavra.fl@...il.com"
	<francescolavra.fl@...il.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"sagis@...gle.com" <sagis@...gle.com>, "Gao, Chao" <chao.gao@...el.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "Miao, Jun"
	<jun.miao@...el.com>, "Annapurve, Vishal" <vannapurve@...gle.com>,
	"jgross@...e.com" <jgross@...e.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v3 02/24] x86/virt/tdx: Add SEAMCALL wrapper
 tdh_mem_page_demote()

On Fri, Jan 16, 2026 at 07:10:05PM +0800, Huang, Kai wrote:
> On Fri, 2026-01-16 at 16:35 +0800, Yan Zhao wrote:
> > Hi Kai,
> > Thanks for reviewing!
> > 
> > On Fri, Jan 16, 2026 at 09:00:29AM +0800, Huang, Kai wrote:
> > > 
> > > > 
> > > > Enable tdh_mem_page_demote() only on TDX modules that support feature
> > > > TDX_FEATURES0.ENHANCE_DEMOTE_INTERRUPTIBILITY, which does not return error
> > > > TDX_INTERRUPTED_RESTARTABLE on basic TDX (i.e., without TD partition) [2].
> > > > 
> > > > This is because error TDX_INTERRUPTED_RESTARTABLE is difficult to handle.
> > > > The TDX module provides no guaranteed maximum retry count to ensure forward
> > > > progress of the demotion. Interrupt storms could then result in a DoS if
> > > > host simply retries endlessly for TDX_INTERRUPTED_RESTARTABLE. Disabling
> > > > interrupts before invoking the SEAMCALL also doesn't work because NMIs can
> > > > also trigger TDX_INTERRUPTED_RESTARTABLE. Therefore, the tradeoff for basic
> > > > TDX is to disable the TDX_INTERRUPTED_RESTARTABLE error given the
> > > > reasonable execution time for demotion. [1]
> > > > 
> > > 
> > > [...]
> > > 
> > > > v3:
> > > > - Use a var name that clearly tell that the page is used as a page table
> > > >   page. (Binbin).
> > > > - Check if TDX module supports feature ENHANCE_DEMOTE_INTERRUPTIBILITY.
> > > >   (Kai).
> > > > 
> > > [...]
> > > 
> > > > +u64 tdh_mem_page_demote(struct tdx_td *td, u64 gpa, int level, struct page *new_sept_page,
> > > > +			u64 *ext_err1, u64 *ext_err2)
> > > > +{
> > > > +	struct tdx_module_args args = {
> > > > +		.rcx = gpa | level,
> > > > +		.rdx = tdx_tdr_pa(td),
> > > > +		.r8 = page_to_phys(new_sept_page),
> > > > +	};
> > > > +	u64 ret;
> > > > +
> > > > +	if (!tdx_supports_demote_nointerrupt(&tdx_sysinfo))
> > > > +		return TDX_SW_ERROR;
> > > > 
> > > 
> > > For the record, while I replied my suggestion [*] to this patch in v2, it
> > > was basically because the discussion was already in that patch -- I didn't
> > > mean to do this check inside tdh_mem_page_demote(), but do this check in
> > > KVM page fault patch and return 4K as maximum mapping level.
> > > 
> > > The precise words were:
> > > 
> > >   So if the decision is to not use 2M page when TDH_MEM_PAGE_DEMOTE can 
> > >   return TDX_INTERRUPTED_RESTARTABLE, maybe we can just check this 
> > >   enumeration in fault handler and always make mapping level as 4K?
> > Right. I followed it in the last patch (patch 24).
> > 
> > > Looking at this series, this is eventually done in your last patch.  But I
> > > don't quite understand what's the additional value of doing such check and
> > > return TDX_SW_ERROR in this SEAMCALL wrapper.
> > > 
> > > Currently in this series, it doesn't matter whether this wrapper returns
> > > TDX_SW_ERROR or the real TDX_INTERRUPTED_RESTARTABLE -- KVM terminates the
> > > TD anyway (see your patch 8) because this is unexpected as checked in your
> > > last patch.
> > > 
> > > IMHO we should get rid of this check in this low level wrapper.
> > You are right, the wrapper shouldn't hit this error after the last patch.
> > 
> > However, I found it's better to introduce the feature bit
> > TDX_FEATURES0_ENHANCE_DEMOTE_INTERRUPTIBILITY and the helper
> > tdx_supports_demote_nointerrupt() together with the demote SEAMCALL wrapper.
> > This way, people can understand how the TDX_INTERRUPTED_RESTARTABLE error is
> > handled for this SEAMCALL. 
> > 
> 
> So the "handling" here is basically making DEMOTE SEAMCALL unavailable
> when DEMOTE is interruptible at low SEAMCALL wrapper level.
> 
> I guess you can argue this has some value since it tells users "don't even
> try to call me when I am interruptible because I am not available".  
Right. The caller can understand the API usage by examining the code
implementation.

> However, IMHO this also implies the benefit is mostly for the case where
> the user wants to use this wrapper to tell whether DEMOTE is available. 
> E.g.,
> 
> 	err = tdh_mem_page_demote(...);
> 	if (err == TDX_SW_ERROR)
> 		enable_tdx_hugepage = false;
This use case is not valid.
When the caller invokes tdh_mem_page_demote(), it means huge pages have already
been enabled, so turning huge pages off on error from splitting huge pages is
self-contradictory.

> But in this series you are using tdx_supports_demote_nointerrupt() for
> this purpose, which is better IMHO.
> 
> So maybe there's a *theoretical* value to have the check here, but I don't
> see any *real* value.
> 
> But I don't have strong opinion either -- I guess I just don't like making
> these low level SEAMCALL wrappers more complicated than what the SEAMCALL
> does -- and it's up to you to decide. :-)
Thanks. I added the checking in the SEAMCALL wrapper for two reasons:
- Let the callers know what the wrapper is expected to work under. So, the
  caller (e.g., KVM) can turn off huge pages upon detecting an incompatible TDX
  module. And forgetting to turn off huge pages would yield at least a WARNING.

- Give tdx_supports_demote_nointerrupt() a user in this patch which introduces
  the helper.

So, I'll keep the check unless someone has a strong opinion :)

> > What do you think about changing it to a WARN_ON_ONCE()? i.e.,
> > WARN_ON_ONCE(!tdx_supports_demote_nointerrupt(&tdx_sysinfo));
> 
> What's your intention?
Hmm, either TDX_SW_ERROR or WARN_ON_ONCE() is fine with me.

I've asked about it in [1]. Let's wait for the maintainers' reply.

[1] https://lore.kernel.org/all/aW3G6yZuvclYABzP@yzhao56-desk.sh.intel.com

> W/o the WARN(), the caller _can_ call this wrapper (i.e., not a kernel
> bug) but it always get a SW-defined error.  Again, maybe it has value for
> the case where the caller wants to use this to tell whether DEMOTE is
> available.
> 
> With the WARN(), it's a kernel bug to call the wrapper, and the caller
> needs to use other way (i.e., tdx_supports_demote_nointerrupt()) to tell
> whether DEMOTE is available.
> 
> So if you want the check, probably WARN() is a better idea since I suppose
> we always want users to use tdx_supports_demote_nointerrupt() to know
> whether DEMOTE can be done, and the WARN() is just to catch bug.
Agreed.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ