lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <200805081520.38310.borntraeger@de.ibm.com>
Date:	Thu, 8 May 2008 15:20:38 +0200
From:	Christian Borntraeger <borntraeger@...ibm.com>
To:	Rusty Russell <rusty@...tcorp.com.au>
Cc:	Ingo Molnar <mingo@...e.hu>,
	virtualization@...ts.linux-foundation.org,
	linux-kernel@...r.kernel.org, kvm-devel@...ts.sourceforge.net
Subject: [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

On kvm I have seen some rare hangs in stop_machine when I used more guest
cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
hang quite often. I could also reproduce the problem on a 4 way z/VM host with 
a 64 way guest.

It turned out that the guest was consuming all available cpus mostly for
spinning on scheduler locks like rq->lock. This is expected as the threads are 
calling yield all the time. 
The problem is now, that the host scheduling decisings together with the guest 
scheduling decisions and spinlocks not being fair managed to create an 
interesting scenario similar to a live lock. (Sometimes the hang resolved 
itself after some minutes)

Changing stop_machine to yield the cpu to the hypervisor when yielding inside 
the guest fixed the problem for me. While I am not completely happy with this 
patch, I think it causes no harm and it really improves the situation for me.

I used cpu_relax for yielding to the hypervisor, does that work on all 
architectures?

p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use 
stop_machine_run and both triggered the problem after some retries. 


Signed-off-by: Christian Borntraeger <borntraeger@...ibm.com>
CC: Ingo Molnar <mingo@...e.hu>
CC: Rusty Russell <rusty@...tcorp.com.au>

---
 kernel/stop_machine.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: kvm/kernel/stop_machine.c
===================================================================
--- kvm.orig/kernel/stop_machine.c
+++ kvm/kernel/stop_machine.c
@@ -62,8 +62,7 @@ static int stopmachine(void *cpu)
 		 * help our sisters onto their CPUs. */
 		if (!prepared && !irqs_disabled)
 			yield();
-		else
-			cpu_relax();
+		cpu_relax();
 	}
 
 	/* Ack: we are exiting. */
@@ -106,8 +105,10 @@ static int stop_machine(void)
 	}
 
 	/* Wait for them all to come to life. */
-	while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads)
+	while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads) {
 		yield();
+		cpu_relax();
+	}
 
 	/* If some failed, kill them all. */
 	if (ret < 0) {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ