linux-kernel - Re: [RFC] kvm: reverse call order of kvm_arch_destroy_vm() and kvm_destroy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c4062e02-4b35-e130-b653-e467bef2eb4f@linux.ibm.com>
Date:   Tue, 5 Jul 2022 15:30:26 -0400
From:   Matthew Rosato <mjrosato@...ux.ibm.com>
To:     Tony Krowiak <akrowiak@...ux.ibm.com>, linux-s390@...r.kernel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc:     jjherne@...ux.ibm.com, borntraeger@...ibm.com, cohuck@...hat.com,
        pasic@...ux.ibm.com, pbonzini@...hat.com, frankja@...ux.ibm.com,
        imbrenda@...ux.ibm.com, david@...hat.com
Subject: Re: [RFC] kvm: reverse call order of kvm_arch_destroy_vm() and
 kvm_destroy_devices()

On 7/5/22 2:54 PM, Tony Krowiak wrote:
> There is a new requirement for s390 secure execution guests that the
> hypervisor ensures all AP queues are reset and disassociated from the
> KVM guest before the secure configuration is torn down. It is the
> responsibility of the vfio_ap device driver to handle this.
> 
> Prior to commit ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM"),
> the driver reset all AP queues passed through to a KVM guest when notified
> that the KVM pointer was being set to NULL. Subsequently, the AP queues
> are only reset when the fd for the mediated device used to pass the queues
> through to the guest is closed (the vfio_ap_mdev_close_device() callback).
> This is not a problem when userspace is well-behaved and uses the
> KVM_DEV_VFIO_GROUP_DEL attribute to remove the VFIO group; however, if
> userspace for some reason does not close the mdev fd, a secure execution
> guest will tear down its configuration before the AP queues are
> reset because the teardown is done in the kvm_arch_destroy_vm function
> which is invoked prior to vm_destroy_devices.

To clarify, even before "vfio: remove VFIO_GROUP_NOTIFY_SET_KVM" if 
userspace did not delete the group via KVM_DEV_VFIO_GROUP_DEL then the 
old callback would also not have been triggered until 
kvm_destroy_devices() anyway (the callback would have been triggered 
with a NULL kvm pointer via a call from kvm_vfio_destroy(), triggered 
from kvm_destroy_devices()).

My point being: this behavior did not start with "vfio: remove 
VFIO_GROUP_NOTIFY_SET_KVM", that patch just removed the notifier since 
both actions always took place at device open/close time anyway.  So if 
destroying the devices before the vm isn't doable, a new 
notifier/whatever that sets the KVM assocation to NULL would also have 
to happen at an earlier point in time than VFIO_GROUP_NOTIFY_SET_KVM did 
(and should maybe be something that is optional/opt-in and used only by 
vfio drivers that need it to cleanup a KVM association at a point prior 
to the device being destroyed).  There should still be no need for any 
sort of notifier to set the (non-NULL) KVM association as it's already 
associated with the vfio group before device_open.

But let's first see if anyone can shed some understanding on the 
ordering between kvm_arch_destroy_vm and kvm_destroy_devices...

> 
> This patch proposes a simple solution; rather than introducing a new
> notifier into vfio or callback into KVM, what aoubt reversing the order
> in which the kvm_arch_destroy_vm and kvm_destroy_devices are called. In
> some very limited testing (i.e., the automated regression tests for
> the vfio_ap device driver) this did not seem to cause any problems.
> 
> The question remains, is there a good technical reason why the VM
> is destroyed before the devices it is using? This is not intuitive, so
> this is a request for comments on this proposed patch. The assumption
> here is that the medev fd will get closed when the devices are destroyed.
> 
> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
> ---
>   virt/kvm/kvm_main.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a49df8988cd6..edaf2918be9b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1248,8 +1248,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
>   #else
>   	kvm_flush_shadow_all(kvm);
>   #endif
> -	kvm_arch_destroy_vm(kvm);
>   	kvm_destroy_devices(kvm);
> +	kvm_arch_destroy_vm(kvm);
>   	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>   		kvm_free_memslots(kvm, &kvm->__memslots[i][0]);
>   		kvm_free_memslots(kvm, &kvm->__memslots[i][1]);