linux-kernel - Re: [RFC PATCH 09/21] KVM: TDX: Enable 2MB mapping size after TD is RUNNABLE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f9a2354f8265efb9ed99beb871e471f92adf133f.camel@intel.com>
Date: Mon, 19 May 2025 16:53:33 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "Zhao, Yan Y" <yan.y.zhao@...el.com>, "Huang, Kai" <kai.huang@...el.com>
CC: "Shutemov, Kirill" <kirill.shutemov@...el.com>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "Du, Fan" <fan.du@...el.com>, "Hansen, Dave"
	<dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>, "Li,
 Zhiquan1" <zhiquan1.li@...el.com>, "vbabka@...e.cz" <vbabka@...e.cz>,
	"tabba@...gle.com" <tabba@...gle.com>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Weiny, Ira" <ira.weiny@...el.com>, "michael.roth@....com"
	<michael.roth@....com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Peng, Chao P" <chao.p.peng@...el.com>,
	"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun"
	<jun.miao@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 09/21] KVM: TDX: Enable 2MB mapping size after TD is
 RUNNABLE

On Mon, 2025-05-19 at 16:32 +0800, Yan Zhao wrote:
> > On the opposite, if other non-Linux TDs don't follow 1G->2M->4K accept
> > order,
> > e.g., they always accept 4K, there could be *endless EPT violation* if I
> > understand your words correctly.
> > 
> > Isn't this yet-another reason we should choose to return PG_LEVEL_4K instead
> > of
> > 2M if no accept level is provided in the fault?
> As I said, returning PG_LEVEL_4K would disallow huge pages for non-Linux TDs.
> TD's accept operations at size > 4KB will get TDACCEPT_SIZE_MISMATCH.

TDX_PAGE_SIZE_MISMATCH is a valid error code that the guest should handle. The
docs say the VMM needs to demote *if* the mapping is large and the accept size
is small. But if we map at 4k size for non-accept EPT violations, we won't hit
this case. I also wonder what is preventing the TDX module from handling a 2MB
accept size at 4k mappings. It could be changed maybe.

But I think Kai's question was: why are we complicating the code for the case of
non-Linux TDs that also use #VE for accept? It's not necessary to be functional,
and there aren't any known TDs like that which are expected to use KVM today.
(err, except the MMU stress test). So in another form the question is: should we
optimize KVM for a case we don't even know if anyone will use? The answer seems
obviously no to me.

I think this connects the question of whether we can pass the necessary info
into fault via synthetic error code. Consider this new design:

 - tdx_gmem_private_max_mapping_level() simply returns 4k for prefetch and pre-
runnable, otherwise returns 2MB
 - if fault has accept info 2MB size, pass 2MB size into fault. Otherwise pass
4k (i.e. VMs that are relying on #VE to do the accept won't get huge pages
*yet*).

What goes wrong? Seems simpler and no more stuffing fault info on the vcpu.