linux-kernel - Re: [PATCH] KVM: Addressing a possible race in kvm_vcpu_on

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zj3ksShWaFSWstii@gmail.com>
Date: Fri, 10 May 2024 02:11:13 -0700
From: Breno Leitao <leitao@...ian.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, rbc@...a.com, paulmck@...nel.org,
	"open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@...r.kernel.org>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: Addressing a possible race in kvm_vcpu_on_spin:

Hello Sean,

On Thu, May 09, 2024 at 09:42:48AM -0700, Sean Christopherson wrote:
> On Thu, May 09, 2024, Breno Leitao wrote:
> >  	kvm_vcpu_set_in_spin_loop(me, true);
> >  	/*
> >  	 * We boost the priority of a VCPU that is runnable but not
> > @@ -4109,7 +4110,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
> >  
> >  			yielded = kvm_vcpu_yield_to(vcpu);
> >  			if (yielded > 0) {
> > -				kvm->last_boosted_vcpu = i;
> > +				WRITE_ONCE(kvm->last_boosted_vcpu, i);
> >  				break;
> >  			} else if (yielded < 0) {
> >  				try--;
> 
> Side topic #1: am I the only one that finds these loops unnecessarily hard to
> read?

No. :-)

In fact, when I skimmed over the code, I though that maybe the code was
not covering the vCPUs before last_boosted_vcpu in the array.

Now that I am looking at it carefully, the code is using `pass` to track
if the vCPU passed last_boosted_vcpu in the index.

> Unless I'm misreading the code, it's really just an indirect way of looping
> over all vCPUs, starting at last_boosted_vcpu+1 and the wrapping.
> 
> IMO, reworking it to be like this is more straightforward:
> 
> 	int nr_vcpus, start, i, idx, yielded;
> 	struct kvm *kvm = me->kvm;
> 	struct kvm_vcpu *vcpu;
> 	int try = 3;
> 
> 	nr_vcpus = atomic_read(&kvm->online_vcpus);
> 	if (nr_vcpus < 2)
> 		return;
> 
> 	/* Pairs with the smp_wmb() in kvm_vm_ioctl_create_vcpu(). */
> 	smp_rmb();

Why do you need this now? Isn't the RCU read lock in xa_load() enough?

> 	kvm_vcpu_set_in_spin_loop(me, true);
> 
> 	start = READ_ONCE(kvm->last_boosted_vcpu) + 1;
> 	for (i = 0; i < nr_vcpus; i++) {

Why do you need to started at the last boosted vcpu? I.e, why not
starting at 0 and skipping me->vcpu_idx and kvm->last_boosted_vcpu?

> 		idx = (start + i) % nr_vcpus;
> 		if (idx == me->vcpu_idx)
> 			continue;
> 
> 		vcpu = xa_load(&kvm->vcpu_array, idx);
> 		if (!READ_ONCE(vcpu->ready))
> 			continue;
> 		if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu))
> 			continue;
> 
> 		/*
> 		 * Treat the target vCPU as being in-kernel if it has a pending
> 		 * interrupt, as the vCPU trying to yield may be spinning
> 		 * waiting on IPI delivery, i.e. the target vCPU is in-kernel
> 		 * for the purposes of directed yield.
> 		 */
> 		if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
> 		    !kvm_arch_dy_has_pending_interrupt(vcpu) &&
> 		    !kvm_arch_vcpu_preempted_in_kernel(vcpu))
> 			continue;
> 
> 		if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
> 			continue;
> 
> 		yielded = kvm_vcpu_yield_to(vcpu);
> 		if (yielded > 0) {
> 			WRITE_ONCE(kvm->last_boosted_vcpu, i);
> 			break;
> 		} else if (yielded < 0 && !--try) {
> 			break;
> 		}
> 	}
> 
> 	kvm_vcpu_set_in_spin_loop(me, false);
> 
> 	/* Ensure vcpu is not eligible during next spinloop */
> 	kvm_vcpu_set_dy_eligible(me, false);

I didn't tested it, but I reviewed it, and it seems sane and way easier
to read. I agree this code is easier to read, from someone that has
little KVM background.