[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aAL4dT1pWG5dDDeo@google.com>
Date: Fri, 18 Apr 2025 18:12:21 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: pbonzini@...hat.com, mlevitsk@...hat.com, kvm@...r.kernel.org,
rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
kai.huang@...el.com, reinette.chatre@...el.com, xiaoyao.li@...el.com,
tony.lindgren@...ux.intel.com, binbin.wu@...ux.intel.com,
isaku.yamahata@...el.com, linux-kernel@...r.kernel.org, yan.y.zhao@...el.com,
chao.gao@...el.com
Subject: Re: [PATCH V2 1/1] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
On Thu, Apr 17, 2025, Adrian Hunter wrote:
> From: Sean Christopherson <seanjc@...gle.com>
>
> Add sub-ioctl KVM_TDX_TERMINATE_VM to release the HKID prior to shutdown,
> which enables more efficient reclaim of private memory.
>
> Private memory is removed from MMU/TDP when guest_memfds are closed. If
> the HKID has not been released, the TDX VM is still in RUNNABLE state,
> so pages must be removed using "Dynamic Page Removal" procedure (refer
> TDX Module Base spec) which involves a number of steps:
> Block further address translation
> Exit each VCPU
> Clear Secure EPT entry
> Flush/write-back/invalidate relevant caches
>
> However, when the HKID is released, the TDX VM moves to TD_TEARDOWN state
> where all TDX VM pages are effectively unmapped, so pages can be reclaimed
> directly.
>
> Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total
> reclaim time. For example:
>
> VCPUs Size (GB) Before (secs) After (secs)
> 4 18 72 24
> 32 107 517 134
> 64 400 5539 467
>
> [Adrian: wrote commit message, added KVM_TDX_TERMINATE_VM documentation,
> and moved cpus_read_lock() inside kvm->lock for consistency as reported
> by lockdep]
/facepalm
I over-thought this. We've had an long-standing battle with kvm_lock vs.
cpus_read_lock(), but this is kvm->lock, not kvm_lock. /sigh
> +static int tdx_terminate_vm(struct kvm *kvm)
> +{
> + int r = 0;
> +
> + guard(mutex)(&kvm->lock);
With kvm->lock taken outside cpus_read_lock(), just handle KVM_TDX_TERMINATE_VM
in the switch statement, i.e. let tdx_vm_ioctl() deal with kvm->lock.
> + cpus_read_lock();
> +
> + if (!kvm_trylock_all_vcpus(kvm)) {
> + r = -EBUSY;
> + goto out;
> + }
> +
> + kvm_vm_dead(kvm);
> + kvm_unlock_all_vcpus(kvm);
> +
> + __tdx_release_hkid(kvm, true);
> +out:
> + cpus_read_unlock();
> + return r;
> +}
> +
> int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
> {
> struct kvm_tdx_cmd tdx_cmd;
> @@ -2805,6 +2827,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
> if (tdx_cmd.hw_error)
> return -EINVAL;
>
> + if (tdx_cmd.id == KVM_TDX_TERMINATE_VM)
> + return tdx_terminate_vm(kvm);
> +
> mutex_lock(&kvm->lock);
>
> switch (tdx_cmd.id) {
> --
> 2.43.0
>
Powered by blists - more mailing lists