[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFC8YThVdrIyAsuS@yzhao56-desk.sh.intel.com>
Date: Tue, 17 Jun 2025 08:52:49 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "Du, Fan" <fan.du@...el.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>,
"Huang, Kai" <kai.huang@...el.com>, "quic_eberman@...cinc.com"
<quic_eberman@...cinc.com>, "Hansen, Dave" <dave.hansen@...el.com>,
"david@...hat.com" <david@...hat.com>, "thomas.lendacky@....com"
<thomas.lendacky@....com>, "vbabka@...e.cz" <vbabka@...e.cz>, "Li, Zhiquan1"
<zhiquan1.li@...el.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
"michael.roth@....com" <michael.roth@....com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
"Peng, Chao P" <chao.p.peng@...el.com>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "Weiny, Ira" <ira.weiny@...el.com>, "Yamahata, Isaku"
<isaku.yamahata@...el.com>, "binbin.wu@...ux.intel.com"
<binbin.wu@...ux.intel.com>, "ackerleytng@...gle.com"
<ackerleytng@...gle.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"Annapurve, Vishal" <vannapurve@...gle.com>, "tabba@...gle.com"
<tabba@...gle.com>, "jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun"
<jun.miao@...el.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 09/21] KVM: TDX: Enable 2MB mapping size after TD is
RUNNABLE
On Tue, Jun 17, 2025 at 06:49:00AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-06-16 at 11:14 +0800, Yan Zhao wrote:
> > > Oh, nice. I hadn't seen this. Agree that a comprehensive guest setup is
> > > quite
> > > manual. But here we are playing with guest ABI. In practice, yes it's
> > > similar to
> > > passing yet another arg to get a good TD.
> > Could we introduce a TD attr TDX_ATTR_SEPT_EXPLICIT_DEMOTION?
> >
> > It can be something similar to TDX_ATTR_SEPT_VE_DISABLE except that we don't
> > provide a dynamical way as the TDCS_CONFIG_FLEXIBLE_PENDING_VE to allow guest
> > to
> > turn on/off SEPT_VE_DISABLE.
> > (See the disable_sept_ve() in ./arch/x86/coco/tdx/tdx.c).
> >
> > So, if userspace configures a TD with TDX_ATTR_SEPT_EXPLICIT_DEMOTION, KVM
> > first
> > checks if SEPT_EXPLICIT_DEMOTION is supported.
> > The guest can also check if it would like to support SEPT_EXPLICIT_DEMOTION to
> > determine to continue or shut down. (If it does not check
> > SEPT_EXPLICIT_DEMOTION,
> > e.g., if we don't want to update EDK2, the guest must accept memory before
> > memory accessing).
> >
> > - if TD is configured with SEPT_EXPLICIT_DEMOTION, KVM allows to map at 2MB
> > when
> > there's no level info in an EPT violation. The guest must accept memory
> > before
> > accessing memory or if it wants to accept only a partial of host's mapping,
> > it
> > needs to explicitly invoke a TDVMCALL to request KVM to perform page
> > demotion.
> >
> > - if TD is configured without SEPT_EXPLICIT_DEMOTION, KVM always maps at 4KB
> > when there's no level info in an EPT violation.
> >
> > - No matter SEPT_EXPLICIT_DEMOTION is configured or not, if there's a level
> > info
> > in an EPT violation, while KVM honors the level info as the max_level info,
> > KVM ignores the demotion request in the fault path.
>
> I think this is what Sean was suggesting. We are going to need a qemu command
> line opt-in too.
>
> >
> > > We can start with a prototype the host side arg and see how it turns out. I
> > > realized we need to verify edk2 as well.
> > Current EDK2 should always accept pages before actual memory access.
> > So, I think it should be fine.
>
> It's not just that, it needs to handle the the accept page size being lower than
> the mapping size. I went and looked and it is accepting at 4k size in places. It
As it accepts pages before memory access, the "accept page size being lower than
the the mapping size" can't happen.
> hopefully is just handling accepting a whole range that is not 2MB aligned. But
> I think we need to verify this more.
Ok.
Powered by blists - more mailing lists