lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y0cm3dJmgnpLgswQ@google.com>
Date:   Wed, 12 Oct 2022 20:43:09 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     isaku.yamahata@...el.com
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Paolo Bonzini <pbonzini@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Marc Zyngier <maz@...nel.org>, Will Deacon <will@...nel.org>,
        isaku.yamahata@...il.com, Kai Huang <kai.huang@...el.com>,
        Chao Gao <chao.gao@...el.com>,
        Atish Patra <atishp@...shpatra.org>,
        Shaokun Zhang <zhangshaokun@...ilicon.com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Huang Ying <ying.huang@...el.com>,
        Huacai Chen <chenhuacai@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH v5 10/30] KVM: Add arch hooks when VM is added/deleted

On Thu, Sep 22, 2022, isaku.yamahata@...el.com wrote:
> From: Isaku Yamahata <isaku.yamahata@...el.com>
> 
> and pass kvm_usage_count with kvm_lock.  Move kvm_arch_post_init_vm() under
> kvm_arch_add_vm().  Replace enable/disable_hardware_all() with the default
> implementation of kvm_arch_add/del_vm().  Later kvm_arch_post_init_vm() is
> deleted once x86 overrides kvm_arch_add_vm().

This needs to explain _why_ KVM is pivoting to add/remove hooks.

> Suggested-by: Sean Christopherson <seanjc@...gle.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> ---
>  include/linux/kvm_host.h |   2 +
>  virt/kvm/kvm_main.c      | 121 ++++++++++++++++++++-------------------
>  2 files changed, 65 insertions(+), 58 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index eab352902de7..3fbb01bbac98 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1445,6 +1445,8 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
>  bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu);
>  bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_arch_post_init_vm(struct kvm *kvm);
> +int kvm_arch_add_vm(struct kvm *kvm, int usage_count);
> +int kvm_arch_del_vm(int usage_count);
>  void kvm_arch_pre_destroy_vm(struct kvm *kvm);
>  int kvm_arch_create_vm_debugfs(struct kvm *kvm);
>  
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c4b908553726..e2c8823786ff 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -142,8 +142,9 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
>  #define KVM_COMPAT(c)	.compat_ioctl	= kvm_no_compat_ioctl,	\
>  			.open		= kvm_no_compat_open
>  #endif
> -static int hardware_enable_all(void);
> -static void hardware_disable_all(void);
> +static void hardware_enable_nolock(void *junk);
> +static void hardware_disable_nolock(void *junk);
> +static void kvm_del_vm(void);

I think kvm_remove_vm() will be less confusing as "remove" is almost never used
to describe freeing something, whereas "delete" is somewhat interchangeable with
"free.  I.e. make it more obvious that the hook isn't intented to actually
delete/free a VM, rather it's there to remove/delete a VM from global tracking.

Ah, and this is especially true since the VM needs to be deleted from vm_list
before the is destroyed, but hardware needs to stay enabled until the VM is fully
destroyed.

Hmm, actually, I think even better would be to have kvm_remove_vm() and kvm_drop_vm()
to make it more obvious that there isn't 100% symmetry between "add" and "remove".

E.g. rename kvm_arch_pre_destroy_vm() => kvm_arch_remove_vm() and then we end up
with (see comments below for more details):

static int kvm_add_vm(struct kvm *kvm)
{
	/*
	 * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
	 * is called, i.e. on_each_cpu() includes CPUs that have not yet been
	 * onlined by KVM.  Disable CPU hotplug to prevent enabling hardware on
	 * a CPU that hasn't yet done compatibility checks.
	 */
	cpus_read_lock();
	mutex_lock(&kvm_lock);
	r = kvm_arch_add_vm(kvm, ++kvm_usage_count);
	if (r) {
		--kvm_usage_count;
		goto out;
	}
	
	list_add(&kvm->vm_list, &vm_list);

out:
	mutex_unlock(&kvm_lock);
	cpus_read_unlock();
}

static void kvm_remove_vm(struct kvm *kvm)
{
	mutex_lock(&kvm_lock);
	list_del(&kvm->vm_list);
	mutex_unlock(&kvm_lock);
	kvm_arch_remove_vm(kvm);
}

static void kvm_drop_vm(void)
{
	cpus_read_lock();
	mutex_lock(&kvm_lock);
	WARN_ON_ONCE(!kvm_usage_count);
	kvm_usage_count--;
	kvm_arch_drop_vm(kvm_usage_count);
	mutex_unlock(&kvm_lock);
	cpus_read_unlock();
}

> @@ -1223,13 +1255,28 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
>  	if (r)
>  		goto out_err_no_debugfs;
>  
> -	r = kvm_arch_post_init_vm(kvm);
> -	if (r)
> -		goto out_err;
> -
> +	/*
> +	 * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
> +	 * is called. on_each_cpu() between them includes the CPU. As a result,
> +	 * hardware_enable_nolock() may get invoked before kvm_online_cpu().
> +	 * This would enable hardware virtualization on that cpu without
> +	 * compatibility checks, which can potentially crash system or break
> +	 * running VMs.
> +	 *
> +	 * Disable CPU hotplug to prevent this case from happening.
> +	 */
> +	cpus_read_lock();
>  	mutex_lock(&kvm_lock);
> +	kvm_usage_count++;
> +	r = kvm_arch_add_vm(kvm, kvm_usage_count);
> +	if (r) {
> +		/* the following kvm_del_vm() decrements kvm_usage_count. */

This is buggy on two fronts.  kvm_usage_count needs to be protected with kvm_lock,
and AFAICT cpus_read_unlock() isn't called.

Invoking kvm_del_vm() in the error path is the source of both bugs.  Typically,
the paired "undo" of an operation should only be used if the "do" operation was
fully successful.

As above, move this to a helper to make juggling the locks less painful.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ