[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIkkkaqTbc9vG_x3@google.com>
Date: Tue, 29 Jul 2025 12:44:17 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] KVM: x86: VMX changes for 6.17
On Mon, Jul 28, 2025, Paolo Bonzini wrote:
> On Sat, Jul 26, 2025 at 12:07 AM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > Add a sub-ioctl to allow getting TDX VMs into TEARDOWN before the last reference
> > to the VM is put, so that reclaiming the VM's memory doesn't have to jump
> > through all the hoops needed to reclaim memory from a live TD, which are quite
> > costly, especially for large VMs.
> >
> > The following changes since commit 347e9f5043c89695b01e66b3ed111755afcf1911:
> >
> > Linux 6.16-rc6 (2025-07-13 14:25:58 -0700)
> >
> > are available in the Git repository at:
> >
> > https://github.com/kvm-x86/linux.git tags/kvm-x86-vmx-6.17
> >
> > for you to fetch changes up to dcab95e533642d8f733e2562b8bfa5715541e0cf:
> >
> > KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM (2025-07-21 16:23:02 -0700)
>
> I haven't pulled this for now because I wonder if it's better to make
> this a general-purpose ioctl and cap (plus a kvm_x86_ops hook). The
> faster teardown is a TDX module quirk, but for example would it be
> useful if you could trigger kvm_vm_dead() in the selftests?
I'm leaning "no" (leaning is probably an understatement).
Mainly because I think the current behavior of vm_dead is a mistake. Rejecting
all ioctls if kvm->vm_dead is true sounds nice on paper, but in practice it gives
us a false sense of security due to the check happening before acquiring kvm->lock,
e.g. see the SEV-ES migration bug found by syzbot.
Enforcing vm_dead with 100% accuracy would be painful given that there are ioctls
that deliberately avoid kvm->lock (vCPU ioctls could simply check KVM_REQ_VM_DEAD),
and I'm not at all convinced that truly making the VM off-limits is actually
desirable. E.g. it prevents quickly freeing resources by nuking memslots.
I do think it makes sense to reject ioctls if vm_bugged is set, because vm_bugged
is all about limiting the damage when something has already gone wrong, i.e.
providing any kind of ABI is very much a non-goal.
And if the vm_dead behavior is gone, I don't think a generic KVM_TERMINATE_VM
adds much, if any value. Blocking KVM_RUN isn't terribly interesting, because
VMMs can already accomplish that with signals+immediate_exit, and AFAIK, real-world
use cases don't have problems with KVM_RUN being called at unexpected times.
One thing that we've discussed internally (though not in much depth) is a way to
block accesses to guest memory, e.g. to guard against accesses to guest memory
while saving vCPU state during live migration, when the VMM might expect that
guest memory is frozen, i.e. can't be dirtied. But we wouldn't want to terminate
the VM in that case, e.g. so that the VM could be resumed if the migration is
aborted at the last minute.
So I think we want something more along the lines of KVM_PAUSE_VM, with specific
semantics and guarantees.
As for this pull request, I vote to drop it for 6.17 and give ourselves time to
figure out what we want to do with vm_dead. I want to land "terminate VM" in
some form by 6.18 (as the next LTS), but AFAIK there's no rush to get it into
6.17.
I posted a series with a slightly modified version of the KVM_TDX_TERMINATE_VM
patch[1] to show where I think we should go. We discussed the topic in v4 of the
KVM_TDX_TERMINATE_VM patch[2], but I opted to post it separate (at the time)
because there wasn't a strict dependency.
[1] https://lore.kernel.org/all/20250729193341.621487-1-seanjc@google.com
[2] https://lore.kernel.org/all/aFNa7L74tjztduT-@google.com
> As a side effect it would remove the supported_caps field and separate
> namespace for KVM_TDX_CAP_* capabilities, at least for now.
Powered by blists - more mailing lists