lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGtprH_9uq-FHHQ=APwgVCe+=_o=yrfCS9snAJfhcg3fr7Qs-g@mail.gmail.com>
Date: Mon, 23 Jun 2025 13:36:33 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Adrian Hunter <adrian.hunter@...el.com>, pbonzini@...hat.com, kvm@...r.kernel.org, 
	rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com, 
	kai.huang@...el.com, reinette.chatre@...el.com, xiaoyao.li@...el.com, 
	tony.lindgren@...ux.intel.com, binbin.wu@...ux.intel.com, 
	isaku.yamahata@...el.com, linux-kernel@...r.kernel.org, yan.y.zhao@...el.com, 
	chao.gao@...el.com
Subject: Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM

On Fri, Jun 20, 2025 at 9:14 AM Vishal Annapurve <vannapurve@...gle.com> wrote:
>
> On Fri, Jun 20, 2025 at 7:24 AM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Thu, Jun 19, 2025, Adrian Hunter wrote:
> > > On 19/06/2025 03:33, Sean Christopherson wrote:
> > > > On Wed, Jun 18, 2025, Adrian Hunter wrote:
> > > >> On 18/06/2025 09:00, Vishal Annapurve wrote:
> > > >>> On Tue, Jun 17, 2025 at 10:50 PM Adrian Hunter <adrian.hunter@...el.com> wrote:
> > > >>>>> Ability to clean up memslots from userspace without closing
> > > >>>>> VM/guest_memfd handles is useful to keep reusing the same guest_memfds
> > > >>>>> for the next boot iteration of the VM in case of reboot.
> > > >>>>
> > > >>>> TD lifecycle does not include reboot.  In other words, reboot is
> > > >>>> done by shutting down the TD and then starting again with a new TD.
> > > >>>>
> > > >>>> AFAIK it is not currently possible to shut down without closing
> > > >>>> guest_memfds since the guest_memfd holds a reference (users_count)
> > > >>>> to struct kvm, and destruction begins when users_count hits zero.
> > > >>>>
> > > >>>
> > > >>> gmem link support[1] allows associating existing guest_memfds with new
> > > >>> VM instances.
> > > >>>
> > > >>> Breakdown of the userspace VMM flow:
> > > >>> 1) Create a new VM instance before closing guest_memfd files.
> > > >>> 2) Link existing guest_memfd files with the new VM instance. -> This
> > > >>> creates new set of files backed by the same inode but associated with
> > > >>> the new VM instance.
> > > >>
> > > >> So what about:
> > > >>
> > > >> 2.5) Call KVM_TDX_TERMINATE_VM IOCTL
> > > >>
> > > >> Memory reclaimed after KVM_TDX_TERMINATE_VM will be done efficiently,
> > > >> so avoid causing it to be reclaimed earlier.
> > > >
> > > > The problem is that setting kvm->vm_dead will prevent (3) from succeeding.  If
> > > > kvm->vm_dead is set, KVM will reject all vCPU, VM, and device (not /dev/kvm the
> > > > device, but rather devices bound to the VM) ioctls.
> > >
> > > (3) is "Close the older guest memfd handles -> results in older VM instance cleanup."
> > >
> > > close() is not an IOCTL, so I do not understand.
> >
> > Sorry, I misread that as "Close the older guest memfd handles by deleting the
> > memslots".
> >
> > > > I intended that behavior, e.g. to guard against userspace blowing up KVM because
> > > > the hkid was released, I just didn't consider the memslots angle.
> > >
> > > The patch was tested with QEMU which AFAICT does not touch  memslots when
> > > shutting down.  Is there a reason to?
> >
> > In this case, the VMM process is not shutting down.  To emulate a reboot, the
> > VMM destroys the VM, but reuses the guest_memfd files for the "new" VM.  Because
> > guest_memfd takes a reference to "struct kvm", through memslot bindings, memslots
>
> guest_memfd takes a reference on the "struct kvm" only on
> creation/linking, currently memslot binding doesn't add additional
> references.
>
> Adrian's suggestion makes sense and it should be functional but I am
> running into some issues which likely need to be resolved on the
> userspace side. I will keep this thread updated.
>
> Currently testing this reboot flow:
> 1) Issue KVM_TDX_TERMINATE_VM on the old VM.
> 2) Close the VM fd.
> 3) Create a new VM fd.
> 4) Link the old guest_memfd handles to the new VM fd.
> 5) Close the old guest_memfd handles.
> 6) Register memslots on the new VM using the linked guest_memfd handles.
>

Apparently mmap takes a refcount on backing files.

So basically I had to modify the reboot flow as:
1) Issue KVM_TDX_TERMINATE_VM on the old VM.
2) Close the VM fd.
3) Create a new VM fd.
4) Link the old guest_memfd handles to the new VM fd.
5) Unmap the VMAs backed by the guest memfd
6) Close the old guest_memfd handles. -> Results in VM destruction
7) Setup new VMAs backed by linked guest_memfd handles.
8) Register memslots on the new VM using the linked guest_memfd handles.

I think the issue simply is that we have tied guest_memfd lifecycle
with VM lifecycle and that discussion is out of scope for this patch.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ