[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cdf00201-b24c-0337-a49f-01df61a45fd1@redhat.com>
Date: Wed, 17 Mar 2021 14:15:38 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Alexander Graf <graf@...zon.com>,
Yuan Yao <yaoyuan0329os@...il.com>
Subject: Re: [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and
set atomically-ish
On 16/03/21 19:44, Sean Christopherson wrote:
> Fix a plethora of issues with MSR filtering by installing the resulting
> filter as an atomic bundle instead of updating the live filter one range
> at a time. The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
> the hardware MSR bitmaps won't be updated until the next VM-Enter, but
> the relevant software struct is atomically updated, which is what KVM
> really needs.
>
> Similar to the approach used for modifying memslots, make arch.msr_filter
> a SRCU-protected pointer, do all the work configuring the new filter
> outside of kvm->lock, and then acquire kvm->lock only when the new filter
> has been vetted and created. That way vCPU readers either see the old
> filter or the new filter in their entirety, not some half-baked state.
>
> Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
> TOCTOU bug, but that's just the tip of the iceberg...
>
> - Nothing is __rcu annotated, making it nigh impossible to audit the
> code for correctness.
> - kvm_add_msr_filter() has an unpaired smp_wmb(). Violation of kernel
> coding style aside, the lack of a smb_rmb() anywhere casts all code
> into doubt.
> - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
> count before taking the lock.
> - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
>
> The entire approach of updating the live filter is also flawed. While
> installing a new filter is inherently racy if vCPUs are running, fixing
> the above issues also makes it trivial to ensure certain behavior is
> deterministic, e.g. KVM can provide deterministic behavior for MSRs with
> identical settings in the old and new filters. An atomic update of the
> filter also prevents KVM from getting into a half-baked state, e.g. if
> installing a filter fails, the existing approach would leave the filter
> in a half-baked state, having already committed whatever bits of the
> filter were already processed.
>
> [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
>
> Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
> Cc: stable@...r.kernel.org
> Cc: Alexander Graf <graf@...zon.com>
> Reported-by: Yuan Yao <yaoyuan0329os@...il.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> Documentation/virt/kvm/api.rst | 6 +-
> arch/x86/include/asm/kvm_host.h | 17 ++---
> arch/x86/kvm/x86.c | 109 +++++++++++++++++++-------------
> 3 files changed, 78 insertions(+), 54 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 38e327d4b479..2898d3e86b08 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4806,8 +4806,10 @@ If an MSR access is not permitted through the filtering, it generates a
> allows user space to deflect and potentially handle various MSR accesses
> into user space.
>
> -If a vCPU is in running state while this ioctl is invoked, the vCPU may
> -experience inconsistent filtering behavior on MSR accesses.
> +Note, invoking this ioctl with a vCPU is running is inherently racy. However,
> +KVM does guarantee that vCPUs will see either the previous filter or the new
> +filter, e.g. MSRs with identical settings in both the old and new filter will
> +have deterministic behavior.
>
> 4.127 KVM_XEN_HVM_SET_ATTR
> --------------------------
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a52f973bdff6..84198c403a48 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -931,6 +931,12 @@ enum kvm_irqchip_mode {
> KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
> };
>
> +struct kvm_x86_msr_filter {
> + u8 count;
> + bool default_allow:1;
> + struct msr_bitmap_range ranges[16];
> +};
> +
> #define APICV_INHIBIT_REASON_DISABLE 0
> #define APICV_INHIBIT_REASON_HYPERV 1
> #define APICV_INHIBIT_REASON_NESTED 2
> @@ -1025,16 +1031,11 @@ struct kvm_arch {
> bool guest_can_read_msr_platform_info;
> bool exception_payload_enabled;
>
> + bool bus_lock_detection_enabled;
> +
> /* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
> u32 user_space_msr_mask;
> -
> - struct {
> - u8 count;
> - bool default_allow:1;
> - struct msr_bitmap_range ranges[16];
> - } msr_filter;
> -
> - bool bus_lock_detection_enabled;
> + struct kvm_x86_msr_filter __rcu *msr_filter;
>
> struct kvm_pmu_event_filter __rcu *pmu_event_filter;
> struct task_struct *nx_lpage_recovery_thread;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a9d95f90a048..c55769620b9a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1529,35 +1529,44 @@ EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);
>
> bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
> {
> + struct kvm_x86_msr_filter *msr_filter;
> + struct msr_bitmap_range *ranges;
> struct kvm *kvm = vcpu->kvm;
> - struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
> - u32 count = kvm->arch.msr_filter.count;
> - u32 i;
> - bool r = kvm->arch.msr_filter.default_allow;
> + bool allowed;
> int idx;
> + u32 i;
>
> - /* MSR filtering not set up or x2APIC enabled, allow everything */
> - if (!count || (index >= 0x800 && index <= 0x8ff))
> + /* x2APIC MSRs do not support filtering. */
> + if (index >= 0x800 && index <= 0x8ff)
> return true;
>
> - /* Prevent collision with set_msr_filter */
> idx = srcu_read_lock(&kvm->srcu);
>
> - for (i = 0; i < count; i++) {
> + msr_filter = srcu_dereference(kvm->arch.msr_filter, &kvm->srcu);
> + if (!msr_filter) {
> + allowed = true;
> + goto out;
> + }
> +
> + allowed = msr_filter->default_allow;
> + ranges = msr_filter->ranges;
> +
> + for (i = 0; i < msr_filter->count; i++) {
> u32 start = ranges[i].base;
> u32 end = start + ranges[i].nmsrs;
> u32 flags = ranges[i].flags;
> unsigned long *bitmap = ranges[i].bitmap;
>
> if ((index >= start) && (index < end) && (flags & type)) {
> - r = !!test_bit(index - start, bitmap);
> + allowed = !!test_bit(index - start, bitmap);
> break;
> }
> }
>
> +out:
> srcu_read_unlock(&kvm->srcu, idx);
>
> - return r;
> + return allowed;
> }
> EXPORT_SYMBOL_GPL(kvm_msr_allowed);
>
> @@ -5389,25 +5398,34 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> return r;
> }
>
> -static void kvm_clear_msr_filter(struct kvm *kvm)
> +static struct kvm_x86_msr_filter *kvm_alloc_msr_filter(bool default_allow)
> +{
> + struct kvm_x86_msr_filter *msr_filter;
> +
> + msr_filter = kzalloc(sizeof(*msr_filter), GFP_KERNEL_ACCOUNT);
> + if (!msr_filter)
> + return NULL;
> +
> + msr_filter->default_allow = default_allow;
> + return msr_filter;
> +}
> +
> +static void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter)
> {
> u32 i;
> - u32 count = kvm->arch.msr_filter.count;
> - struct msr_bitmap_range ranges[16];
>
> - mutex_lock(&kvm->lock);
> - kvm->arch.msr_filter.count = 0;
> - memcpy(ranges, kvm->arch.msr_filter.ranges, count * sizeof(ranges[0]));
> - mutex_unlock(&kvm->lock);
> - synchronize_srcu(&kvm->srcu);
> + if (!msr_filter)
> + return;
>
> - for (i = 0; i < count; i++)
> - kfree(ranges[i].bitmap);
> + for (i = 0; i < msr_filter->count; i++)
> + kfree(msr_filter->ranges[i].bitmap);
> +
> + kfree(msr_filter);
> }
>
> -static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user_range)
> +static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter,
> + struct kvm_msr_filter_range *user_range)
> {
> - struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
> struct msr_bitmap_range range;
> unsigned long *bitmap = NULL;
> size_t bitmap_size;
> @@ -5441,11 +5459,9 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
> goto err;
> }
>
> - /* Everything ok, add this range identifier to our global pool */
> - ranges[kvm->arch.msr_filter.count] = range;
> - /* Make sure we filled the array before we tell anyone to walk it */
> - smp_wmb();
> - kvm->arch.msr_filter.count++;
> + /* Everything ok, add this range identifier. */
> + msr_filter->ranges[msr_filter->count] = range;
> + msr_filter->count++;
>
> return 0;
> err:
> @@ -5456,10 +5472,11 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
> static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
> {
> struct kvm_msr_filter __user *user_msr_filter = argp;
> + struct kvm_x86_msr_filter *new_filter, *old_filter;
> struct kvm_msr_filter filter;
> bool default_allow;
> - int r = 0;
> bool empty = true;
> + int r = 0;
> u32 i;
>
> if (copy_from_user(&filter, user_msr_filter, sizeof(filter)))
> @@ -5472,25 +5489,32 @@ static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
> if (empty && !default_allow)
> return -EINVAL;
>
> - kvm_clear_msr_filter(kvm);
> + new_filter = kvm_alloc_msr_filter(default_allow);
> + if (!new_filter)
> + return -ENOMEM;
>
> - kvm->arch.msr_filter.default_allow = default_allow;
> -
> - /*
> - * Protect from concurrent calls to this function that could trigger
> - * a TOCTOU violation on kvm->arch.msr_filter.count.
> - */
> - mutex_lock(&kvm->lock);
> for (i = 0; i < ARRAY_SIZE(filter.ranges); i++) {
> - r = kvm_add_msr_filter(kvm, &filter.ranges[i]);
> - if (r)
> - break;
> + r = kvm_add_msr_filter(new_filter, &filter.ranges[i]);
> + if (r) {
> + kvm_free_msr_filter(new_filter);
> + return r;
> + }
> }
>
> + mutex_lock(&kvm->lock);
> +
> + /* The per-VM filter is protected by kvm->lock... */
> + old_filter = srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1);
> +
> + rcu_assign_pointer(kvm->arch.msr_filter, new_filter);
> + synchronize_srcu(&kvm->srcu);
> +
> + kvm_free_msr_filter(old_filter);
> +
> kvm_make_all_cpus_request(kvm, KVM_REQ_MSR_FILTER_CHANGED);
> mutex_unlock(&kvm->lock);
>
> - return r;
> + return 0;
> }
>
> long kvm_arch_vm_ioctl(struct file *filp,
> @@ -10693,8 +10717,6 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm)
>
> void kvm_arch_destroy_vm(struct kvm *kvm)
> {
> - u32 i;
> -
> if (current->mm == kvm->mm) {
> /*
> * Free memory regions allocated on behalf of userspace,
> @@ -10710,8 +10732,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> mutex_unlock(&kvm->slots_lock);
> }
> static_call_cond(kvm_x86_vm_destroy)(kvm);
> - for (i = 0; i < kvm->arch.msr_filter.count; i++)
> - kfree(kvm->arch.msr_filter.ranges[i].bitmap);
> + kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
> kvm_pic_destroy(kvm);
> kvm_ioapic_destroy(kvm);
> kvm_free_vcpus(kvm);
>
Reviewed-by: Paolo Bonzini <pbonzini@...hat.com>
Powered by blists - more mailing lists