[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e86aa631-bedd-44b4-b95a-9e941d14b059@intel.com>
Date: Wed, 18 Jun 2025 08:50:04 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>, <kvm@...r.kernel.org>,
<rick.p.edgecombe@...el.com>, <kirill.shutemov@...ux.intel.com>,
<kai.huang@...el.com>, <reinette.chatre@...el.com>, <xiaoyao.li@...el.com>,
<tony.lindgren@...ux.intel.com>, <binbin.wu@...ux.intel.com>,
<isaku.yamahata@...el.com>, <linux-kernel@...r.kernel.org>,
<yan.y.zhao@...el.com>, <chao.gao@...el.com>
Subject: Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
On 16/06/2025 06:40, Vishal Annapurve wrote:
> On Wed, Jun 11, 2025 at 2:52 AM Adrian Hunter <adrian.hunter@...el.com> wrote:
>>
>> From: Sean Christopherson <seanjc@...gle.com>
>>
>> Add sub-ioctl KVM_TDX_TERMINATE_VM to release the HKID prior to shutdown,
>> which enables more efficient reclaim of private memory.
>>
>> Private memory is removed from MMU/TDP when guest_memfds are closed. If
>> the HKID has not been released, the TDX VM is still in RUNNABLE state,
>> so pages must be removed using "Dynamic Page Removal" procedure (refer
>> TDX Module Base spec) which involves a number of steps:
>> Block further address translation
>> Exit each VCPU
>> Clear Secure EPT entry
>> Flush/write-back/invalidate relevant caches
>>
>> However, when the HKID is released, the TDX VM moves to TD_TEARDOWN state
>> where all TDX VM pages are effectively unmapped, so pages can be reclaimed
>> directly.
>>
>> Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total
>> reclaim time. For example:
>>
>> VCPUs Size (GB) Before (secs) After (secs)
>> 4 18 72 24
>> 32 107 517 134
>> 64 400 5539 467
>>
>> Link: https://lore.kernel.org/r/Z-V0qyTn2bXdrPF7@google.com
>> Link: https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com
>> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
>> Co-developed-by: Adrian Hunter <adrian.hunter@...el.com>
>> Signed-off-by: Adrian Hunter <adrian.hunter@...el.com>
>> ---
>>
>>
>> Changes in V4:
>>
>> Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately.
>> Use KVM_BUG_ON() instead of WARN_ON().
>> Correct kvm_trylock_all_vcpus() return value.
>>
>> Changes in V3:
>>
>> Remove KVM_BUG_ON() from tdx_mmu_release_hkid() because it would
>> trigger on the error path from __tdx_td_init()
>>
>> Put cpus_read_lock() handling back into tdx_mmu_release_hkid()
>>
>> Handle KVM_TDX_TERMINATE_VM in the switch statement, i.e. let
>> tdx_vm_ioctl() deal with kvm->lock
>> ....
>>
>> +static int tdx_terminate_vm(struct kvm *kvm)
>> +{
>> + if (kvm_trylock_all_vcpus(kvm))
>> + return -EBUSY;
>> +
>> + kvm_vm_dead(kvm);
>
> With this no more VM ioctls can be issued on this instance. How would
> userspace VMM clean up the memslots? Is the expectation that
> guest_memfd and VM fds are closed to actually reclaim the memory?
Yes
>
> Ability to clean up memslots from userspace without closing
> VM/guest_memfd handles is useful to keep reusing the same guest_memfds
> for the next boot iteration of the VM in case of reboot.
TD lifecycle does not include reboot. In other words, reboot is
done by shutting down the TD and then starting again with a new TD.
AFAIK it is not currently possible to shut down without closing
guest_memfds since the guest_memfd holds a reference (users_count)
to struct kvm, and destruction begins when users_count hits zero.
Powered by blists - more mailing lists