[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ffb401e800363862c5dd90664993e8e234c7361b.camel@intel.com>
Date: Mon, 16 Jun 2025 22:49:00 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "Zhao, Yan Y" <yan.y.zhao@...el.com>
CC: "Du, Fan" <fan.du@...el.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>,
"Huang, Kai" <kai.huang@...el.com>, "quic_eberman@...cinc.com"
<quic_eberman@...cinc.com>, "Hansen, Dave" <dave.hansen@...el.com>,
"david@...hat.com" <david@...hat.com>, "thomas.lendacky@....com"
<thomas.lendacky@....com>, "vbabka@...e.cz" <vbabka@...e.cz>, "Li, Zhiquan1"
<zhiquan1.li@...el.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
"michael.roth@....com" <michael.roth@....com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
"Peng, Chao P" <chao.p.peng@...el.com>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "Weiny, Ira" <ira.weiny@...el.com>, "Yamahata, Isaku"
<isaku.yamahata@...el.com>, "binbin.wu@...ux.intel.com"
<binbin.wu@...ux.intel.com>, "ackerleytng@...gle.com"
<ackerleytng@...gle.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"Annapurve, Vishal" <vannapurve@...gle.com>, "tabba@...gle.com"
<tabba@...gle.com>, "jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun"
<jun.miao@...el.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 09/21] KVM: TDX: Enable 2MB mapping size after TD is
RUNNABLE
On Mon, 2025-06-16 at 11:14 +0800, Yan Zhao wrote:
> > Oh, nice. I hadn't seen this. Agree that a comprehensive guest setup is
> > quite
> > manual. But here we are playing with guest ABI. In practice, yes it's
> > similar to
> > passing yet another arg to get a good TD.
> Could we introduce a TD attr TDX_ATTR_SEPT_EXPLICIT_DEMOTION?
>
> It can be something similar to TDX_ATTR_SEPT_VE_DISABLE except that we don't
> provide a dynamical way as the TDCS_CONFIG_FLEXIBLE_PENDING_VE to allow guest
> to
> turn on/off SEPT_VE_DISABLE.
> (See the disable_sept_ve() in ./arch/x86/coco/tdx/tdx.c).
>
> So, if userspace configures a TD with TDX_ATTR_SEPT_EXPLICIT_DEMOTION, KVM
> first
> checks if SEPT_EXPLICIT_DEMOTION is supported.
> The guest can also check if it would like to support SEPT_EXPLICIT_DEMOTION to
> determine to continue or shut down. (If it does not check
> SEPT_EXPLICIT_DEMOTION,
> e.g., if we don't want to update EDK2, the guest must accept memory before
> memory accessing).
>
> - if TD is configured with SEPT_EXPLICIT_DEMOTION, KVM allows to map at 2MB
> when
> there's no level info in an EPT violation. The guest must accept memory
> before
> accessing memory or if it wants to accept only a partial of host's mapping,
> it
> needs to explicitly invoke a TDVMCALL to request KVM to perform page
> demotion.
>
> - if TD is configured without SEPT_EXPLICIT_DEMOTION, KVM always maps at 4KB
> when there's no level info in an EPT violation.
>
> - No matter SEPT_EXPLICIT_DEMOTION is configured or not, if there's a level
> info
> in an EPT violation, while KVM honors the level info as the max_level info,
> KVM ignores the demotion request in the fault path.
I think this is what Sean was suggesting. We are going to need a qemu command
line opt-in too.
>
> > We can start with a prototype the host side arg and see how it turns out. I
> > realized we need to verify edk2 as well.
> Current EDK2 should always accept pages before actual memory access.
> So, I think it should be fine.
It's not just that, it needs to handle the the accept page size being lower than
the mapping size. I went and looked and it is accepting at 4k size in places. It
hopefully is just handling accepting a whole range that is not 2MB aligned. But
I think we need to verify this more.
Powered by blists - more mailing lists