lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 10 May 2024 07:39:14 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Breno Leitao <leitao@...ian.org>
Cc: Paolo Bonzini <pbonzini@...hat.com>, rbc@...a.com, paulmck@...nel.org, 
	"open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@...r.kernel.org>, open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: Addressing a possible race in kvm_vcpu_on_spin:

On Fri, May 10, 2024, Breno Leitao wrote:
> > IMO, reworking it to be like this is more straightforward:
> > 
> > 	int nr_vcpus, start, i, idx, yielded;
> > 	struct kvm *kvm = me->kvm;
> > 	struct kvm_vcpu *vcpu;
> > 	int try = 3;
> > 
> > 	nr_vcpus = atomic_read(&kvm->online_vcpus);
> > 	if (nr_vcpus < 2)
> > 		return;
> > 
> > 	/* Pairs with the smp_wmb() in kvm_vm_ioctl_create_vcpu(). */
> > 	smp_rmb();
> 
> Why do you need this now? Isn't the RCU read lock in xa_load() enough?

No.  RCU read lock doesn't suffice, because on kernels without PREEMPT_COUNT
rcu_read_lock() may be a literal nop.  There may be a _compiler_ barrier, but
smp_rmb() requires more than a compiler barrier on many architectures.

And just as importantly, KVM shouldn't be relying on the inner details of other
code without a hard guarantee of that behavior.  E.g. KVM does rely on
srcu_read_unlock() to provide a full memory barrier, but KVM does so through an
"official" API, smp_mb__after_srcu_read_unlock().

> > 	kvm_vcpu_set_in_spin_loop(me, true);
> > 
> > 	start = READ_ONCE(kvm->last_boosted_vcpu) + 1;
> > 	for (i = 0; i < nr_vcpus; i++) {
> 
> Why do you need to started at the last boosted vcpu? I.e, why not
> starting at 0 and skipping me->vcpu_idx and kvm->last_boosted_vcpu?

To provide round-robin style yielding in order to (hopefully) yield to the vCPU
that is holding a spinlock (or some other asset that is causing a vCPU to spin
in kernel mode).

E.g. if there are 4 vCPUs all running on a single CPU, vCPU3 gets preempted while
holding a spinlock, and all vCPUs are contented for said spinlock then starting
at vCPU0 every time would result in vCPU1 yielding to vCPU0, and vCPU0 yielding
back to vCPU1, indefinitely.

Starting at the last boosted vCPU instead results in vCPU0 yielding to vCPU1,
vCPU1 yielding to vCPU2, and vCPU2 yielding to vCPU3, thus getting back to the
vCPU that holds the spinlock soon-ish.

I'm sure more sophisticated/performant approaches are possible, but they would
likely be more complex, require persistent state (a.k.a. memory usage), and/or
need knowledge of the workload being run.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ