linux-kernel - Re: [RFC PATCH 09/21] KVM: TDX: Enable 2MB mapping size after TD is RUNNABLE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <02f8678221629a0aa05a73bcade8e1fe6f3aa1e5.camel@intel.com>
Date: Tue, 20 May 2025 23:34:52 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Zhao, Yan Y" <yan.y.zhao@...el.com>
CC: "Shutemov, Kirill" <kirill.shutemov@...el.com>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "Du, Fan" <fan.du@...el.com>, "Hansen, Dave"
	<dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>, "Li,
 Zhiquan1" <zhiquan1.li@...el.com>, "vbabka@...e.cz" <vbabka@...e.cz>,
	"tabba@...gle.com" <tabba@...gle.com>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Weiny, Ira" <ira.weiny@...el.com>, "michael.roth@....com"
	<michael.roth@....com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Peng, Chao P" <chao.p.peng@...el.com>,
	"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pgonda@...gle.com"
	<pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 09/21] KVM: TDX: Enable 2MB mapping size after TD is
 RUNNABLE

On Mon, 2025-05-19 at 16:32 +0800, Zhao, Yan Y wrote:
> > But in the above text you mentioned that, if doing so, because we choose to
> > ignore splitting request on read, returning 2M could result in *endless* EPT
> > violation.
> I don't get what you mean.
> What's the relationship between splitting and "returning 2M could result in
> *endless* EPT" ?
> 
> > So to me it seems you choose a design that could bring performance gain for
> > certain non-Linux TDs when they follow a certain behaviour but otherwise could
> > result in endless EPT violation in KVM.
> Also don't understand here.
> Which design could result in endless EPT violation?

[Sorry somehow I didn't see your replies yesterday in my mailbox.]

You mentioned below in your coverletter:

    (b) with shared kvm->mmu_lock, triggered by fault.

    ....

    This series simply ignores the splitting request in the fault path to
    avoid unnecessary bounces between levels. The vCPU that performs ACCEPT
    at a lower level would finally figures out the page has been accepted
    at a higher level by another vCPU.

    ... The worst outcome to ignore the resulting
    splitting request is an endless EPT violation. This would not happen
    for a Linux guest, which does not expect any #VE.

So to me, IIUC, this means:

 - this series choose to ignore splitting request when read ..
 - the worse outcome to ignore the resulting splitting request is an endless
   EPT violation..

And this happens exactly in below case:

 1) Guest touches a 4K page
 2) KVM AUGs 2M page
 3) Guest re-accesses that 4K page, and receives #VE
 4) Guest ACCEPTs that 4K page, this triggers EPT violation

IIUC, you choose to ignore splitting large page in step 4) (am I right???). 
Then if guest always ACCEPTs page at 4K level, then KVM will have *endless EPT
violation*.

So, is this the "worst outcome to ignore the resulting splitting request" that
you mentioned in your changelog?

If it is, then why is it OK?

It is OK *ONLY* when "guest always ACCEPTs 4K page" is a buggy behaviour of the
guest itself (which KVM is not responsible for).  I.e., the guest is always
supposed to find the page size that KVM has AUGed upon receiving the #VE (does
the #VE contain such information?) and then do ACCEPT at that page level.

Otherwise, if it's a legal behaviour for the guest to always ACCEPT at 4K level,
then I don't think it's OK to have endless EPT violation in KVM.