[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGtprH-an308biSmM=c=W2FS2XeOWM9CxB3vWu9D=LD__baWUQ@mail.gmail.com>
Date: Fri, 20 Jun 2025 09:14:16 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Adrian Hunter <adrian.hunter@...el.com>, pbonzini@...hat.com, kvm@...r.kernel.org,
rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
kai.huang@...el.com, reinette.chatre@...el.com, xiaoyao.li@...el.com,
tony.lindgren@...ux.intel.com, binbin.wu@...ux.intel.com,
isaku.yamahata@...el.com, linux-kernel@...r.kernel.org, yan.y.zhao@...el.com,
chao.gao@...el.com
Subject: Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
On Fri, Jun 20, 2025 at 7:24 AM Sean Christopherson <seanjc@...gle.com> wrote:
>
> On Thu, Jun 19, 2025, Adrian Hunter wrote:
> > On 19/06/2025 03:33, Sean Christopherson wrote:
> > > On Wed, Jun 18, 2025, Adrian Hunter wrote:
> > >> On 18/06/2025 09:00, Vishal Annapurve wrote:
> > >>> On Tue, Jun 17, 2025 at 10:50 PM Adrian Hunter <adrian.hunter@...el.com> wrote:
> > >>>>> Ability to clean up memslots from userspace without closing
> > >>>>> VM/guest_memfd handles is useful to keep reusing the same guest_memfds
> > >>>>> for the next boot iteration of the VM in case of reboot.
> > >>>>
> > >>>> TD lifecycle does not include reboot. In other words, reboot is
> > >>>> done by shutting down the TD and then starting again with a new TD.
> > >>>>
> > >>>> AFAIK it is not currently possible to shut down without closing
> > >>>> guest_memfds since the guest_memfd holds a reference (users_count)
> > >>>> to struct kvm, and destruction begins when users_count hits zero.
> > >>>>
> > >>>
> > >>> gmem link support[1] allows associating existing guest_memfds with new
> > >>> VM instances.
> > >>>
> > >>> Breakdown of the userspace VMM flow:
> > >>> 1) Create a new VM instance before closing guest_memfd files.
> > >>> 2) Link existing guest_memfd files with the new VM instance. -> This
> > >>> creates new set of files backed by the same inode but associated with
> > >>> the new VM instance.
> > >>
> > >> So what about:
> > >>
> > >> 2.5) Call KVM_TDX_TERMINATE_VM IOCTL
> > >>
> > >> Memory reclaimed after KVM_TDX_TERMINATE_VM will be done efficiently,
> > >> so avoid causing it to be reclaimed earlier.
> > >
> > > The problem is that setting kvm->vm_dead will prevent (3) from succeeding. If
> > > kvm->vm_dead is set, KVM will reject all vCPU, VM, and device (not /dev/kvm the
> > > device, but rather devices bound to the VM) ioctls.
> >
> > (3) is "Close the older guest memfd handles -> results in older VM instance cleanup."
> >
> > close() is not an IOCTL, so I do not understand.
>
> Sorry, I misread that as "Close the older guest memfd handles by deleting the
> memslots".
>
> > > I intended that behavior, e.g. to guard against userspace blowing up KVM because
> > > the hkid was released, I just didn't consider the memslots angle.
> >
> > The patch was tested with QEMU which AFAICT does not touch memslots when
> > shutting down. Is there a reason to?
>
> In this case, the VMM process is not shutting down. To emulate a reboot, the
> VMM destroys the VM, but reuses the guest_memfd files for the "new" VM. Because
> guest_memfd takes a reference to "struct kvm", through memslot bindings, memslots
guest_memfd takes a reference on the "struct kvm" only on
creation/linking, currently memslot binding doesn't add additional
references.
Adrian's suggestion makes sense and it should be functional but I am
running into some issues which likely need to be resolved on the
userspace side. I will keep this thread updated.
Currently testing this reboot flow:
1) Issue KVM_TDX_TERMINATE_VM on the old VM.
2) Close the VM fd.
3) Create a new VM fd.
4) Link the old guest_memfd handles to the new VM fd.
5) Close the old guest_memfd handles.
6) Register memslots on the new VM using the linked guest_memfd handles.
That being said, I still see the value in what Sean suggested.
" Remove vm_dead and instead reject ioctls based on vm_bugged, and simply rely
on KVM_REQ_VM_DEAD to prevent running the guest."
This will help with:
1) Keeping the cleanup sequence as close as possible to the normal VM
cleanup sequence.
2) Actual VM destruction happens at step 5 from the above mentioned
flow, if there is any cleanup that happens asynchronously, userspace
can enforce synchronous cleanup by executing graceful VM shutdown
stages before step 2 above.
And IIUC the goal here is to achieve exactly what Sean suggested above
i.e. prevent running the guest after KVM_TDX_TERMINATE_VM is issued.
> need to be manually destroyed so that all references are put and the VM is freed
> by the kernel. E.g. otherwise multiple reboots would manifest as memory leakds
> and eventually OOM the host.
Powered by blists - more mailing lists