[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zj5C2Psbm8EY+Q4F@gmail.com>
Date: Fri, 10 May 2024 08:52:56 -0700
From: Breno Leitao <leitao@...ian.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, rbc@...a.com, paulmck@...nel.org,
"open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: Addressing a possible race in kvm_vcpu_on_spin:
On Fri, May 10, 2024 at 07:39:14AM -0700, Sean Christopherson wrote:
> On Fri, May 10, 2024, Breno Leitao wrote:
> > > IMO, reworking it to be like this is more straightforward:
> > >
> > > int nr_vcpus, start, i, idx, yielded;
> > > struct kvm *kvm = me->kvm;
> > > struct kvm_vcpu *vcpu;
> > > int try = 3;
> > >
> > > nr_vcpus = atomic_read(&kvm->online_vcpus);
> > > if (nr_vcpus < 2)
> > > return;
> > >
> > > /* Pairs with the smp_wmb() in kvm_vm_ioctl_create_vcpu(). */
> > > smp_rmb();
> >
> > Why do you need this now? Isn't the RCU read lock in xa_load() enough?
>
> No. RCU read lock doesn't suffice, because on kernels without PREEMPT_COUNT
> rcu_read_lock() may be a literal nop. There may be a _compiler_ barrier, but
> smp_rmb() requires more than a compiler barrier on many architectures.
Makes sense. In fact, it makes sense to have an explicit barrier in-between
the xarray modify operations and reading/storing online_vcpus.
> > > kvm_vcpu_set_in_spin_loop(me, true);
> > >
> > > start = READ_ONCE(kvm->last_boosted_vcpu) + 1;
> > > for (i = 0; i < nr_vcpus; i++) {
> >
> > Why do you need to started at the last boosted vcpu? I.e, why not
> > starting at 0 and skipping me->vcpu_idx and kvm->last_boosted_vcpu?
>
> To provide round-robin style yielding in order to (hopefully) yield to the vCPU
> that is holding a spinlock (or some other asset that is causing a vCPU to spin
> in kernel mode).
>
> E.g. if there are 4 vCPUs all running on a single CPU, vCPU3 gets preempted while
> holding a spinlock, and all vCPUs are contented for said spinlock then starting
> at vCPU0 every time would result in vCPU1 yielding to vCPU0, and vCPU0 yielding
> back to vCPU1, indefinitely.
Makes sense, this would always privilege vCPU 0 in favor of the last
vCPU. 100% clear. Thanks!
Powered by blists - more mailing lists