[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCcIrjw9B2h0YjuV@yzhao56-desk.sh.intel.com>
Date: Fri, 16 May 2025 17:43:10 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Huang, Kai" <kai.huang@...el.com>
CC: "pbonzini@...hat.com" <pbonzini@...hat.com>, "seanjc@...gle.com"
<seanjc@...gle.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Li, Xiaoyao"
<xiaoyao.li@...el.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
"Hansen, Dave" <dave.hansen@...el.com>, "david@...hat.com"
<david@...hat.com>, "thomas.lendacky@....com" <thomas.lendacky@....com>,
"vbabka@...e.cz" <vbabka@...e.cz>, "tabba@...gle.com" <tabba@...gle.com>,
"Du, Fan" <fan.du@...el.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
"Weiny, Ira" <ira.weiny@...el.com>, "michael.roth@....com"
<michael.roth@....com>, "binbin.wu@...ux.intel.com"
<binbin.wu@...ux.intel.com>, "ackerleytng@...gle.com"
<ackerleytng@...gle.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
"Peng, Chao P" <chao.p.peng@...el.com>, "kvm@...r.kernel.org"
<kvm@...r.kernel.org>, "Annapurve, Vishal" <vannapurve@...gle.com>,
"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>,
"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 09/21] KVM: TDX: Enable 2MB mapping size after TD is
RUNNABLE
On Fri, May 16, 2025 at 09:35:37AM +0800, Huang, Kai wrote:
> On Tue, 2025-05-13 at 20:10 +0000, Edgecombe, Rick P wrote:
> > > @@ -3265,7 +3263,7 @@ int tdx_gmem_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
> > > if (unlikely(to_kvm_tdx(kvm)->state != TD_STATE_RUNNABLE))
> > > return PG_LEVEL_4K;
> > >
> > > - return PG_LEVEL_4K;
> > > + return PG_LEVEL_2M;
> >
> > Maybe combine this with patch 4, or split them into sensible categories.
>
> How about merge with patch 12
>
> [RFC PATCH 12/21] KVM: TDX: Determine max mapping level according to vCPU's
> ACCEPT level
>
> instead?
>
> Per patch 12, the fault due to TDH.MEM.PAGE.ACCPT contains fault level info, so
> KVM should just return that. But seems we are still returning PG_LEVEL_2M if no
> such info is provided (IIUC):
Yes, if without such info (tdx->violation_request_level), we always return
PG_LEVEL_2M.
> int tdx_gmem_private_max_mapping_level(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> gfn_t gfn)
> {
> + struct vcpu_tdx *tdx = to_tdx(vcpu);
> +
> if (unlikely(to_kvm_tdx(vcpu->kvm)->state != TD_STATE_RUNNABLE))
> return PG_LEVEL_4K;
>
> + if (gfn >= tdx->violation_gfn_start && gfn < tdx->violation_gfn_end)
> + return tdx->violation_request_level;
> +
> return PG_LEVEL_2M;
> }
>
> So why not returning PT_LEVEL_4K at the end?
>
> I am asking because below text mentioned in the coverletter:
>
> A rare case that could lead to splitting in the fault path is when a TD
> is configured to receive #VE and accesses memory before the ACCEPT
> operation. By the time a vCPU accesses a private GFN, due to the lack
> of any guest preferred level, KVM could create a mapping at 2MB level.
> If the TD then only performs the ACCEPT operation at 4KB level,
> splitting in the fault path will be triggered. However, this is not
> regarded as a typical use case, as usually TD always accepts pages in
> the order from 1GB->2MB->4KB. The worst outcome to ignore the resulting
> splitting request is an endless EPT violation. This would not happen
> for a Linux guest, which does not expect any #VE.
>
> Changing to return PT_LEVEL_4K should avoid this problem. It doesn't hurt
For TDs expect #VE, guests access private memory before accept it.
In that case, upon KVM receives EPT violation, there's no expected level from
the TDX module. Returning PT_LEVEL_4K at the end basically disables huge pages
for those TDs.
Besides, according to Kirill [1], the order from 1GB->2MB->4KB is only the case
for linux guests.
[1] https://lore.kernel.org/all/6vdj4mfxlyvypn743klxq5twda66tkugwzljdt275rug2gmwwl@zdziylxpre6y/#t
> normal cases either, since guest will always do ACCEPT (which contains the
> accepting level) before accessing the memory.
Powered by blists - more mailing lists