linux-kernel - Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2c04ba99e403a277c3d6b9ce0d6a3cb9f808caef.camel@intel.com>
Date: Mon, 23 Jun 2025 16:23:32 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "Annapurve, Vishal" <vannapurve@...gle.com>
CC: "Gao, Chao" <chao.gao@...el.com>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Huang, Kai" <kai.huang@...el.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Chatre, Reinette" <reinette.chatre@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Hunter,
 Adrian" <adrian.hunter@...el.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>,
	"tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>,
	"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>
Subject: Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM

On Fri, 2025-06-20 at 20:00 -0700, Vishal Annapurve wrote:
> Can you provide enough information to evaluate how the whole problem is being
> > solved? (it sounds like you have the full solution implemented?)
> > 
> > The problem seems to be that rebuilding a whole TD for reboot is too slow. Does
> > the S-EPT survive if the VM is destroyed? If not, how does keeping the pages in
> > guestmemfd help with re-faulting? If the S-EPT is preserved, then what happens
> > when the new guest re-accepts it?
> 
> SEPT entries don't survive reboots.
> 
> The faulting-in I was referring to is just allocation of memory pages
> for guest_memfd offsets.
> 
> > 
> > > 
> > > > 
> > > > The series Vishal linked has some kind of SEV state transfer thing. How is
> > > > it
> > > > intended to work for TDX?
> > > 
> > > The series[1] unblocks intrahost-migration [2] and reboot usecases.
> > > 
> > > [1] https://lore.kernel.org/lkml/cover.1747368092.git.afranji@google.com/#t
> > > [2] https://lore.kernel.org/lkml/cover.1749672978.git.afranji@google.com/#t
> > 
> > The question was: how was this reboot optimization intended to work for TDX? Are
> > you saying that it works via intra-host migration? Like some state is migrated
> > to the new TD to start it up?
> 
> Reboot optimization is not specific to TDX, it's basically just about
> trying to reuse the same physical memory for the next boot. No state
> is preserved here except the mapping of guest_memfd offsets to
> physical memory pages.

Hmm, it doesn't sound like much work, especially at the 1GB level. I wonder if
it has something to do with the cost of zeroing the pages. If they went to a
global allocator and back, they would need to be zeroed to make sure data is not
leaked to another userspace process. But if it stays with the fd, this could be
skipped?

For TDX though, hmm, we may not actually need to zero the private pages because
of the transition to keyid 0. It would be beneficial to have the different VMs
types work the same. But, under this speculation of the real benefit, there may
be other ways to get the same benefits that are worth considering when we hit
frictions like this. To do that kind of consideration though, everyone needs to
understand what the real goal is.

In general I think we really need to fully evaluate these optimizations as part
of the upstreaming process. We have already seen two post-base series TDX
optimizations that didn't stand up under scrutiny. It turned out the existing
TDX page promotion implementation wasn't actually getting used much if at all.
Also, the parallel TD reclaim thing turned out to be misguided once we looked
into the root cause. So if we blindly incorporate optimizations based on vague
or promised justification, it seems likely we will end up maintaining some
amount of complex code with no purpose. Then it will be difficult to prove later
that it is not needed, and just remain a burden.

So can we please start explaining more of the "why" for this stuff so we can get
to the best upstream solution?