[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4062e02-4b35-e130-b653-e467bef2eb4f@linux.ibm.com>
Date: Tue, 5 Jul 2022 15:30:26 -0400
From: Matthew Rosato <mjrosato@...ux.ibm.com>
To: Tony Krowiak <akrowiak@...ux.ibm.com>, linux-s390@...r.kernel.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: jjherne@...ux.ibm.com, borntraeger@...ibm.com, cohuck@...hat.com,
pasic@...ux.ibm.com, pbonzini@...hat.com, frankja@...ux.ibm.com,
imbrenda@...ux.ibm.com, david@...hat.com
Subject: Re: [RFC] kvm: reverse call order of kvm_arch_destroy_vm() and
kvm_destroy_devices()
On 7/5/22 2:54 PM, Tony Krowiak wrote:
> There is a new requirement for s390 secure execution guests that the
> hypervisor ensures all AP queues are reset and disassociated from the
> KVM guest before the secure configuration is torn down. It is the
> responsibility of the vfio_ap device driver to handle this.
>
> Prior to commit ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM"),
> the driver reset all AP queues passed through to a KVM guest when notified
> that the KVM pointer was being set to NULL. Subsequently, the AP queues
> are only reset when the fd for the mediated device used to pass the queues
> through to the guest is closed (the vfio_ap_mdev_close_device() callback).
> This is not a problem when userspace is well-behaved and uses the
> KVM_DEV_VFIO_GROUP_DEL attribute to remove the VFIO group; however, if
> userspace for some reason does not close the mdev fd, a secure execution
> guest will tear down its configuration before the AP queues are
> reset because the teardown is done in the kvm_arch_destroy_vm function
> which is invoked prior to vm_destroy_devices.
To clarify, even before "vfio: remove VFIO_GROUP_NOTIFY_SET_KVM" if
userspace did not delete the group via KVM_DEV_VFIO_GROUP_DEL then the
old callback would also not have been triggered until
kvm_destroy_devices() anyway (the callback would have been triggered
with a NULL kvm pointer via a call from kvm_vfio_destroy(), triggered
from kvm_destroy_devices()).
My point being: this behavior did not start with "vfio: remove
VFIO_GROUP_NOTIFY_SET_KVM", that patch just removed the notifier since
both actions always took place at device open/close time anyway. So if
destroying the devices before the vm isn't doable, a new
notifier/whatever that sets the KVM assocation to NULL would also have
to happen at an earlier point in time than VFIO_GROUP_NOTIFY_SET_KVM did
(and should maybe be something that is optional/opt-in and used only by
vfio drivers that need it to cleanup a KVM association at a point prior
to the device being destroyed). There should still be no need for any
sort of notifier to set the (non-NULL) KVM association as it's already
associated with the vfio group before device_open.
But let's first see if anyone can shed some understanding on the
ordering between kvm_arch_destroy_vm and kvm_destroy_devices...
>
> This patch proposes a simple solution; rather than introducing a new
> notifier into vfio or callback into KVM, what aoubt reversing the order
> in which the kvm_arch_destroy_vm and kvm_destroy_devices are called. In
> some very limited testing (i.e., the automated regression tests for
> the vfio_ap device driver) this did not seem to cause any problems.
>
> The question remains, is there a good technical reason why the VM
> is destroyed before the devices it is using? This is not intuitive, so
> this is a request for comments on this proposed patch. The assumption
> here is that the medev fd will get closed when the devices are destroyed.
>
> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
> ---
> virt/kvm/kvm_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a49df8988cd6..edaf2918be9b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1248,8 +1248,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
> #else
> kvm_flush_shadow_all(kvm);
> #endif
> - kvm_arch_destroy_vm(kvm);
> kvm_destroy_devices(kvm);
> + kvm_arch_destroy_vm(kvm);
> for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> kvm_free_memslots(kvm, &kvm->__memslots[i][0]);
> kvm_free_memslots(kvm, &kvm->__memslots[i][1]);
Powered by blists - more mailing lists