linux-kernel - Re: [PATCH] KVM: Addressing a possible race in kvm_vcpu_on

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Zj5C2Psbm8EY+Q4F@gmail.com>
Date: Fri, 10 May 2024 08:52:56 -0700
From: Breno Leitao <leitao@...ian.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, rbc@...a.com, paulmck@...nel.org,
	"open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@...r.kernel.org>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: Addressing a possible race in kvm_vcpu_on_spin:

On Fri, May 10, 2024 at 07:39:14AM -0700, Sean Christopherson wrote:
> On Fri, May 10, 2024, Breno Leitao wrote:
> > > IMO, reworking it to be like this is more straightforward:
> > > 
> > > 	int nr_vcpus, start, i, idx, yielded;
> > > 	struct kvm *kvm = me->kvm;
> > > 	struct kvm_vcpu *vcpu;
> > > 	int try = 3;
> > > 
> > > 	nr_vcpus = atomic_read(&kvm->online_vcpus);
> > > 	if (nr_vcpus < 2)
> > > 		return;
> > > 
> > > 	/* Pairs with the smp_wmb() in kvm_vm_ioctl_create_vcpu(). */
> > > 	smp_rmb();
> > 
> > Why do you need this now? Isn't the RCU read lock in xa_load() enough?
> 
> No.  RCU read lock doesn't suffice, because on kernels without PREEMPT_COUNT
> rcu_read_lock() may be a literal nop.  There may be a _compiler_ barrier, but
> smp_rmb() requires more than a compiler barrier on many architectures.

Makes sense. In fact, it makes sense to have an explicit barrier in-between
the xarray modify operations and reading/storing online_vcpus.

> > > 	kvm_vcpu_set_in_spin_loop(me, true);
> > > 
> > > 	start = READ_ONCE(kvm->last_boosted_vcpu) + 1;
> > > 	for (i = 0; i < nr_vcpus; i++) {
> > 
> > Why do you need to started at the last boosted vcpu? I.e, why not
> > starting at 0 and skipping me->vcpu_idx and kvm->last_boosted_vcpu?
> 
> To provide round-robin style yielding in order to (hopefully) yield to the vCPU
> that is holding a spinlock (or some other asset that is causing a vCPU to spin
> in kernel mode).
> 
> E.g. if there are 4 vCPUs all running on a single CPU, vCPU3 gets preempted while
> holding a spinlock, and all vCPUs are contented for said spinlock then starting
> at vCPU0 every time would result in vCPU1 yielding to vCPU0, and vCPU0 yielding
> back to vCPU1, indefinitely.

Makes sense, this would always privilege vCPU 0 in favor of the last
vCPU. 100% clear. Thanks!