lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e86aa631-bedd-44b4-b95a-9e941d14b059@intel.com>
Date: Wed, 18 Jun 2025 08:50:04 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>, <kvm@...r.kernel.org>,
	<rick.p.edgecombe@...el.com>, <kirill.shutemov@...ux.intel.com>,
	<kai.huang@...el.com>, <reinette.chatre@...el.com>, <xiaoyao.li@...el.com>,
	<tony.lindgren@...ux.intel.com>, <binbin.wu@...ux.intel.com>,
	<isaku.yamahata@...el.com>, <linux-kernel@...r.kernel.org>,
	<yan.y.zhao@...el.com>, <chao.gao@...el.com>
Subject: Re: [PATCH V4 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM

On 16/06/2025 06:40, Vishal Annapurve wrote:
> On Wed, Jun 11, 2025 at 2:52 AM Adrian Hunter <adrian.hunter@...el.com> wrote:
>>
>> From: Sean Christopherson <seanjc@...gle.com>
>>
>> Add sub-ioctl KVM_TDX_TERMINATE_VM to release the HKID prior to shutdown,
>> which enables more efficient reclaim of private memory.
>>
>> Private memory is removed from MMU/TDP when guest_memfds are closed. If
>> the HKID has not been released, the TDX VM is still in RUNNABLE state,
>> so pages must be removed using "Dynamic Page Removal" procedure (refer
>> TDX Module Base spec) which involves a number of steps:
>>         Block further address translation
>>         Exit each VCPU
>>         Clear Secure EPT entry
>>         Flush/write-back/invalidate relevant caches
>>
>> However, when the HKID is released, the TDX VM moves to TD_TEARDOWN state
>> where all TDX VM pages are effectively unmapped, so pages can be reclaimed
>> directly.
>>
>> Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total
>> reclaim time.  For example:
>>
>>         VCPUs   Size (GB)       Before (secs)   After (secs)
>>          4       18               72             24
>>         32      107              517            134
>>         64      400             5539            467
>>
>> Link: https://lore.kernel.org/r/Z-V0qyTn2bXdrPF7@google.com
>> Link: https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@google.com
>> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
>> Co-developed-by: Adrian Hunter <adrian.hunter@...el.com>
>> Signed-off-by: Adrian Hunter <adrian.hunter@...el.com>
>> ---
>>
>>
>> Changes in V4:
>>
>>         Drop TDX_FLUSHVP_NOT_DONE change.  It will be done separately.
>>         Use KVM_BUG_ON() instead of WARN_ON().
>>         Correct kvm_trylock_all_vcpus() return value.
>>
>> Changes in V3:
>>
>>         Remove KVM_BUG_ON() from tdx_mmu_release_hkid() because it would
>>         trigger on the error path from __tdx_td_init()
>>
>>         Put cpus_read_lock() handling back into tdx_mmu_release_hkid()
>>
>>         Handle KVM_TDX_TERMINATE_VM in the switch statement, i.e. let
>>         tdx_vm_ioctl() deal with kvm->lock
>> ....
>>
>> +static int tdx_terminate_vm(struct kvm *kvm)
>> +{
>> +       if (kvm_trylock_all_vcpus(kvm))
>> +               return -EBUSY;
>> +
>> +       kvm_vm_dead(kvm);
> 
> With this no more VM ioctls can be issued on this instance. How would
> userspace VMM clean up the memslots? Is the expectation that
> guest_memfd and VM fds are closed to actually reclaim the memory?

Yes

> 
> Ability to clean up memslots from userspace without closing
> VM/guest_memfd handles is useful to keep reusing the same guest_memfds
> for the next boot iteration of the VM in case of reboot.

TD lifecycle does not include reboot.  In other words, reboot is
done by shutting down the TD and then starting again with a new TD.

AFAIK it is not currently possible to shut down without closing
guest_memfds since the guest_memfd holds a reference (users_count)
to struct kvm, and destruction begins when users_count hits zero.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ