[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aIzu4q_7yBmCIOWK@google.com>
Date: Fri, 1 Aug 2025 09:44:18 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>,
Paolo Bonzini <pbonzini@...hat.com>, linux-arm-kernel@...ts.infradead.org,
kvmarm@...ts.linux.dev, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Vishal Annapurve <vannapurve@...gle.com>, Xiaoyao Li <xiaoyao.li@...el.com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>, Nikolay Borisov <nik.borisov@...e.com>,
Yan Y Zhao <yan.y.zhao@...el.com>, Kai Huang <kai.huang@...el.com>,
"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>
Subject: Re: [PATCH 5/5] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
+Chao
On Fri, Aug 01, 2025, Adrian Hunter wrote:
> On 29/07/2025 22:33, Sean Christopherson wrote:
> > +static int tdx_terminate_vm(struct kvm *kvm)
> > +{
> > + if (kvm_trylock_all_vcpus(kvm))
> > + return -EBUSY;
> > +
> > + kvm_vm_dead(kvm);
> > + to_kvm_tdx(kvm)->vm_terminated = true;
> > +
> > + kvm_unlock_all_vcpus(kvm);
> > +
> > + tdx_mmu_release_hkid(kvm);
> > +
> > + return 0;
> > +}
>
> As I think I mentioned when removing vm_dead first came up,
> I think we need more checks. I spent some time going through
> the code and came up with what is below:
>
> First, we need to avoid TDX VCPU sub-IOCTLs from racing with
> tdx_mmu_release_hkid(). But having any TDX sub-IOCTL run after
> KVM_TDX_TERMINATE_VM raises questions of what might happen, so
> it is much simpler to understand, if that is not possible.
> There are 3 options:
>
> 1. Require that KVM_TDX_TERMINATE_VM is valid only if
> kvm_tdx->state == TD_STATE_RUNNABLE. Since currently all
> the TDX sub-IOCTLs are for initialization, that would block
> the opportunity for any to run after KVM_TDX_TERMINATE_VM.
>
> 2. Check vm_terminated in tdx_vm_ioctl() and tdx_vcpu_ioctl()
>
> 3. Test KVM_REQ_VM_DEAD in tdx_vm_ioctl() and tdx_vcpu_ioctl()
>
> [ Note cannot check is_hkid_assigned() because that is racy ]
>
> Secondly, I suggest we avoid SEAMCALLs that will fail and
> result in KVM_BUG_ON() if HKID has been released.
>
> There are 2 groups of those: MMU-related and TDVPS_ACCESSORS.
>
> For the MMU-related, the following 2 functions should return
> an error immediately if vm_terminated:
>
> tdx_sept_link_private_spt()
> tdx_sept_set_private_spte()
>
> For that not be racy, extra synchronization is needed so that
> vm_terminated can be reliably checked when holding mmu lock
> i.e.
>
> static int tdx_terminate_vm(struct kvm *kvm)
> {
> if (kvm_trylock_all_vcpus(kvm))
> return -EBUSY;
>
> kvm_vm_dead(kvm);
> +
> + write_lock(&kvm->mmu_lock);
> to_kvm_tdx(kvm)->vm_terminated = true;
> + write_unlock(&kvm->mmu_lock);
>
> kvm_unlock_all_vcpus(kvm);
>
> tdx_mmu_release_hkid(kvm);
>
> return 0;
> }
>
> Finally, there are 2 TDVPS_ACCESSORS that need avoiding:
>
> tdx_load_mmu_pgd()
> skip td_vmcs_write64() if vm_terminated
>
> tdx_protected_apic_has_interrupt()
> skip td_state_non_arch_read64() if vm_terminated
Oof. And as Chao pointed out[*], removing the vm_dead check would allow creating
and running vCPUs in a dead VM, which is most definitely not desirable. Squashing
the vCPU creation case is easy enough if we keep vm_dead but still generally allow
ioctls, and it's probably worth doing that no matter what (to plug the hole where
pending vCPU creations could succeed):
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d477a7fda0ae..941d2c32b7dc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4207,6 +4207,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
mutex_lock(&kvm->lock);
+ if (kvm->vm_dead) {
+ r = -EIO;
+ goto unlock_vcpu_destroy;
+ }
+
if (kvm_get_vcpu_by_id(kvm, id)) {
r = -EEXIST;
goto unlock_vcpu_destroy;
And then to ensure vCPUs can't do anything, check KVM_REQ_VM_DEAD after acquiring
vcpu->mutex.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6c07dd423458..883077eee4ce 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4433,6 +4433,12 @@ static long kvm_vcpu_ioctl(struct file *filp,
if (mutex_lock_killable(&vcpu->mutex))
return -EINTR;
+
+ if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) {
+ r = -EIO;
+ goto out;
+ }
+
switch (ioctl) {
case KVM_RUN: {
struct pid *oldpid;
That should address all TDVPS paths (I hope), and I _think_ would address all
MMU-related paths as well? E.g. prefault requires a vCPU.
Disallowing (most) vCPU ioctls but not all VM ioctls on vm_dead isn't great ABI
(understatement), but I think we need/want the above changes even if we keep the
general vm_dead restriction. And given the extremely ad hoc behavior of taking
kvm->lock for VM ioctls, trying to enforce vm_dead for "all" VM ioctls seems like
a fool's errand.
So I'm leaning toward keeping "KVM: Reject ioctls only if the VM is bugged, not
simply marked dead" (with a different shortlog+changelog), but keeping vm_dead
(and not introducing kvm_tdx.vm_terminated).
Thoughts?
[*] https://lore.kernel.org/all/aIlzeT+yFG2Tvb3%2F@intel.com
Powered by blists - more mailing lists