linux-kernel - Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bb4e47e7569549d1bb288228e0f7976936c4410c.camel@intel.com>
Date: Mon, 23 Jun 2025 22:51:46 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "Annapurve, Vishal" <vannapurve@...gle.com>
CC: "Gao, Chao" <chao.gao@...el.com>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Huang, Kai" <kai.huang@...el.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Chatre, Reinette" <reinette.chatre@...el.com>,
	"Li, Xiaoyao" <xiaoyao.li@...el.com>, "Hunter, Adrian"
	<adrian.hunter@...el.com>, "tony.lindgren@...ux.intel.com"
	<tony.lindgren@...ux.intel.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "kirill.shutemov@...ux.intel.com"
	<kirill.shutemov@...ux.intel.com>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"Zhao, Yan Y" <yan.y.zhao@...el.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>
Subject: Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM

On Mon, 2025-06-23 at 13:22 -0700, Vishal Annapurve wrote:
> A simple question I ask to myself is that if a certain memory specific
> optimization/feature is enabled for non-confidential VMs, why it can't
> be enabled for Confidential VMs. I think as long as we cleanly
> separate memory management from RMP/SEPT management for CVMs, there
> should ideally be no major issues with enabling such optimizations for
> Confidential VMs.

Yes, having them work the same should probably help with maintainability. As
long as making them work the same doesn't cause too much complexity somewhere
else. i.e. kind of what we were discussing here.

> 
> Just memory allocation without zeroing, even with hugepages takes time
> for large VM shapes and I don't really see a valid reason for the
> userspace VMM to repeat the freeing and allocation cycles.

Hmm, this is surprising to me. Do you have any idea what kind of cycles we are
talking about?

> 
> > For TDX though, hmm, we may not actually need to zero the private pages because
> > of the transition to keyid 0. It would be beneficial to have the different VMs
> > types work the same. But, under this speculation of the real benefit, there may
> > be other ways to get the same benefits that are worth considering when we hit
> > frictions like this. To do that kind of consideration though, everyone needs to
> > understand what the real goal is.
> > 
> > In general I think we really need to fully evaluate these optimizations as part
> > of the upstreaming process. We have already seen two post-base series TDX
> > optimizations that didn't stand up under scrutiny. It turned out the existing
> > TDX page promotion implementation wasn't actually getting used much if at all.
> > Also, the parallel TD reclaim thing turned out to be misguided once we looked
> 
> For a ~700G guest memory, guest shutdown times:
> 1) Parallel TD reclaim + hugepage EPT mappings  : 30 secs
> 2) TD shutdown with KVM_TDX_TERMINATE_VM + hugepage EPT mappings: 2 mins
> 3) Without any optimization: ~ 30-40 mins
> 
> KVM_TDX_TERMINATE_VM for now is a very good start and is much simpler
> to upstream.

Parallel reclaim is misguided because it's attacking the wrong root cause. It's
not an example of a bad goal, but a pitfall of requiring a specific solution
instead of reviewing the reasoning as part of the upstreaming process. We
shouldn't do parallel reclaim ioctl because there are simpler, faster ways to
reclaim. It looks like we never circled back on this though. My bad for bring it
up as an example for explaining the details.

We have not posted the alternate approach because we have too many TDX series in
progress on the list and I think we should do them iteratively. Also, as you say
huge pages + KVM_TDX_TERMINATE_VM gets us an order of magnitude the way there.
It puts further improvements down the priority list.

> 
> > into the root cause. So if we blindly incorporate optimizations based on vague
> > or promised justification, it seems likely we will end up maintaining some
> > amount of complex code with no purpose. Then it will be difficult to prove later
> > that it is not needed, and just remain a burden.
> > 
> > So can we please start explaining more of the "why" for this stuff so we can get
> > to the best upstream solution?

So the answer is no? I think we should close it because it seems to be
generating a lot of mails with the same pattern.