[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8889dc3b-d672-41c3-8d11-e88861b7b38e@redhat.com>
Date: Sun, 20 Oct 2024 13:23:46 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Will Deacon <will@...nel.org>, Michal Luczaj <mhal@...x.co>,
Alexander Potapenko <glider@...gle.com>, Marc Zyngier <maz@...nel.org>,
Oliver Upton <oliver.upton@...ux.dev>
Subject: Re: [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races"
On 10/10/24 19:48, Sean Christopherson wrote:
> On Thu, Oct 10, 2024, Paolo Bonzini wrote:
>> On 10/9/24 17:04, Sean Christopherson wrote:
>>> Now that KVM loads from vcpu_array if and only if the target index is
>>> valid with respect to online_vcpus, i.e. now that it is safe to erase a
>>> not-fully-onlined vCPU entry, revert to storing into vcpu_array before
>>> success is guaranteed.
>>>
>>> If xa_store() fails, which _should_ be impossible, then putting the vCPU's
>>> reference to 'struct kvm' results in a refcounting bug as the vCPU fd has
>>> been installed and owns the vCPU's reference.
>>>
>>> This was found by inspection, but forcing the xa_store() to fail
>>> confirms the problem:
>>>
>>> | Unable to handle kernel paging request at virtual address ffff800080ecd960
>>> | Call trace:
>>> | _raw_spin_lock_irq+0x2c/0x70
>>> | kvm_irqfd_release+0x24/0xa0
>>> | kvm_vm_release+0x1c/0x38
>>> | __fput+0x88/0x2ec
>>> | ____fput+0x10/0x1c
>>> | task_work_run+0xb0/0xd4
>>> | do_exit+0x210/0x854
>>> | do_group_exit+0x70/0x98
>>> | get_signal+0x6b0/0x73c
>>> | do_signal+0xa4/0x11e8
>>> | do_notify_resume+0x60/0x12c
>>> | el0_svc+0x64/0x68
>>> | el0t_64_sync_handler+0x84/0xfc
>>> | el0t_64_sync+0x190/0x194
>>> | Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08)
>>>
>>> Practically speaking, this is a non-issue as xa_store() can't fail, absent
>>> a nasty kernel bug. But the code is visually jarring and technically
>>> broken.
>>>
>>> This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520.
>>>
>>> Cc: Paolo Bonzini <pbonzini@...hat.com>
>>> Cc: Michal Luczaj <mhal@...x.co>
>>> Cc: Alexander Potapenko <glider@...gle.com>
>>> Cc: Marc Zyngier <maz@...nel.org>
>>> Reported-by: Will Deacon <will@...nel.org>
>>> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
>>> ---
>>> virt/kvm/kvm_main.c | 14 +++++---------
>>> 1 file changed, 5 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index fca9f74e9544..f081839521ef 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>>> }
>>> vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
>>> - r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT);
>>> + r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
>>> + BUG_ON(r == -EBUSY);
>>> if (r)
>>> goto unlock_vcpu_destroy;
>>> @@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>>> kvm_get_kvm(kvm);
>>> r = create_vcpu_fd(vcpu);
>>> if (r < 0)
>>> - goto kvm_put_xa_release;
>>> -
>>> - if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) {
>>> - r = -EINVAL;
>>> - goto kvm_put_xa_release;
>>> - }
>>> + goto kvm_put_xa_erase;
>>
>> I also find it a bit jarring though that we have to undo the insertion. This
>> is a chicken-and-egg situation where you are pick one operation B that will
>> have to undo operation A if it fails. But what xa_store is doing, is
>> breaking this deadlock.
>>
>> The code is a bit longer, sure, but I don't see the point in complicating
>> the vcpu_array invariants and letting an entry disappear.
>
> But we only need one rule: vcpu_array[x] is valid if and only if 'x' is less than
> online_vcpus. And that rule is necessary regardless of whether or not vcpu_array[x]
> is filled before success is guaranteed.
Even if the invariant is explainable I still find xa_erase to be uglier
than xa_release, but maybe it's just me.
The reason I'm not fully convinced by the explanation is that...
> I'm not concerned about the code length, it's that we need to do _something_ if
> xa_store() fails. Yeah, it should never happen, but knowingly doing nothing feels
> all kinds of wrong.
... it seems to me that this is not just an issue in KVM code; it should
apply to other uses of xa_reserve()/xa_store() as well. If xa_store()
fails after xa_reserve(), you're pretty much using the xarray API
incorrectly... and then, just make it a BUG()? I know that BUG() is
frowned upon, but if the API causes invalid memory accesses when used
incorrectly, one might as well fail as early as possible and before the
invalid memory access becomes exploitable.
> I don't like BUG(), because it's obviously very doable to
> gracefully handle failure.
Yes, you can by using a different API. But the point is that in the
reserve/store case the insert failure becomes a reserve failure, never a
store failure.
Maybe there should be an xa_store_reserved() that BUGs on failure, I
don't know.
Paolo
Powered by blists - more mailing lists