lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aC07+s9VvNFCG1ZI@yzhao56-desk.sh.intel.com>
Date: Wed, 21 May 2025 10:35:38 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>
CC: "Shutemov, Kirill" <kirill.shutemov@...el.com>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "Du, Fan" <fan.du@...el.com>, "Hansen, Dave"
	<dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>, "Li,
 Zhiquan1" <zhiquan1.li@...el.com>, "vbabka@...e.cz" <vbabka@...e.cz>,
	"tabba@...gle.com" <tabba@...gle.com>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Weiny, Ira" <ira.weiny@...el.com>, "michael.roth@....com"
	<michael.roth@....com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Peng, Chao P" <chao.p.peng@...el.com>,
	"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pgonda@...gle.com"
	<pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 09/21] KVM: TDX: Enable 2MB mapping size after TD is
 RUNNABLE

On Wed, May 21, 2025 at 07:34:52AM +0800, Huang, Kai wrote:
> On Mon, 2025-05-19 at 16:32 +0800, Zhao, Yan Y wrote:
> > > But in the above text you mentioned that, if doing so, because we choose to
> > > ignore splitting request on read, returning 2M could result in *endless* EPT
> > > violation.
> > I don't get what you mean.
> > What's the relationship between splitting and "returning 2M could result in
> > *endless* EPT" ?
> > 
> > > So to me it seems you choose a design that could bring performance gain for
> > > certain non-Linux TDs when they follow a certain behaviour but otherwise could
> > > result in endless EPT violation in KVM.
> > Also don't understand here.
> > Which design could result in endless EPT violation?
> 
> [Sorry somehow I didn't see your replies yesterday in my mailbox.]
> 
> You mentioned below in your coverletter:
> 
>     (b) with shared kvm->mmu_lock, triggered by fault.
> 
>     ....
> 
>     This series simply ignores the splitting request in the fault path to
>     avoid unnecessary bounces between levels. The vCPU that performs ACCEPT
>     at a lower level would finally figures out the page has been accepted
>     at a higher level by another vCPU.
> 
>     ... The worst outcome to ignore the resulting
>     splitting request is an endless EPT violation. This would not happen
>     for a Linux guest, which does not expect any #VE.
> 
> So to me, IIUC, this means:
> 
>  - this series choose to ignore splitting request when read ..
>  - the worse outcome to ignore the resulting splitting request is an endless
>    EPT violation..
> 
> And this happens exactly in below case:
> 
>  1) Guest touches a 4K page
>  2) KVM AUGs 2M page
>  3) Guest re-accesses that 4K page, and receives #VE
>  4) Guest ACCEPTs that 4K page, this triggers EPT violation
> 
> IIUC, you choose to ignore splitting large page in step 4) (am I right???). 
> Then if guest always ACCEPTs page at 4K level, then KVM will have *endless EPT
> violation*.
> 
> So, is this the "worst outcome to ignore the resulting splitting request" that
> you mentioned in your changelog?
> 
> If it is, then why is it OK?
Initially I assumed the guest should always accept in the sequence of
"1G->2M->4K" as what's linux guest is doing.

If that's true, we can simply ignore the splitting request in the fault (shared)
path because it's the guest that not follow the convention.

However, Kirill and you are right, the guest can accept at 4K.

Given that, the "worst outcome to ignore the resulting splitting request" is not
OK. 

> It is OK *ONLY* when "guest always ACCEPTs 4K page" is a buggy behaviour of the
> guest itself (which KVM is not responsible for).  I.e., the guest is always
> supposed to find the page size that KVM has AUGed upon receiving the #VE (does
> the #VE contain such information?) and then do ACCEPT at that page level.
> 
> Otherwise, if it's a legal behaviour for the guest to always ACCEPT at 4K level,
> then I don't think it's OK to have endless EPT violation in KVM.
We can avoid the endless EPT violation by allowing the splitting in the fault
path, which involves the introduction of several locks in TDX code though. I had
a POC for that one, but we felt that it's better to keep the initial support
simple.

So, if we all agree not to support huge pages for non-Linux TDs as an initial
step, your proposal is a good idea to keep splitting code simple.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ