linux-kernel - Re: [PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121107102545.GA18644@linux.vnet.ibm.com>
Date:	Wed, 7 Nov 2012 15:55:45 +0530
From:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
To:	Avi Kivity <avi@...hat.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Ingo Molnar <mingo@...hat.com>, Rik van Riel <riel@...hat.com>,
	Srikar <srikar@...ux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
	Chegu Vinod <chegu_vinod@...com>,
	"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>,
	Gleb Natapov <gleb@...hat.com>,
	Andrew Jones <drjones@...hat.com>
Subject: Re: [PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for
 potential undercommit case

* Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com> [2012-10-31 22:36:25]:

> On 10/31/2012 07:11 PM, Avi Kivity wrote:
> >On 10/31/2012 03:15 PM, Raghavendra K T wrote:
> >>On 10/31/2012 06:11 PM, Raghavendra K T wrote:
> >>>On 10/31/2012 06:08 PM, Avi Kivity wrote:
> >>>>On 10/29/2012 04:07 PM, Raghavendra K T wrote:
> >>>>>From: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
> >>>>>
> >>>>>Also we do not update last boosted vcpu in failure cases.
> >>>>>
> >>>>>   #endif
> >>>>>+
> >>>>>   void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> >>>>>   {
> >>>>>       struct kvm *kvm = me->kvm;
> >>>>>@@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> >>>>>                   continue;
> >>>>>               if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
> >>>>>                   continue;
> >>>>>-            if (kvm_vcpu_yield_to(vcpu)) {
> >>>>>+
> >>>>>+            yielded = kvm_vcpu_yield_to(vcpu);
> >>>>>+            if (yielded > 0)
> >>>>>                   kvm->last_boosted_vcpu = i;
> >>>>>-                yielded = 1;
> >>>>>+            if (yielded)
> >>>>>                   break;
> >>>>>-            }
> >>>>>           }
> >>>>
> >>>>If yielded == -ESRCH, should we not try to yield to another vcpu?
> >>>>
> >>>
> >>>   Yes. plan is to abort the iteration. since it means we are mostly
> >>>undercommitted.
> >>
> >>Sorry if it was ambiguous. I wanted to say we do not want to continue
> >>yield to another vcpu..
> >>
> >
> >
> >Why not?  We found that this particular vcpu is running and therefore
> >likely not a lock holder.  That says nothing about other vcpus.  The
> >next in line might be runnable-but-not-running on another runqueue.
> 
> Agree that next in the line might be runnable-not-running.  But here we
> are optimistic that, that is not the case and we save time by
> returning back instead of iterating, thinking we are mostly in
> undercommitted case and each vcpu has dedicated cpu.
> 
> Probably an alternative we have here is to look for say 2-3 successive
> failures before breaking out?

Hi Avi,

I tried the idea of bailing out only when we have successive failure
too (hunk below). results are as follows.

base = 3.7.0-rc1 
A = base +  patch 1 +  patch 2 (original series except patch 3)
B = A + bail out on successive failures patch (hunk below)

% improvements w.r.t base kernel 32 vcpu guest on 32 core PLE machine

                          A               B
kernbench_1x         1.83959         0.95158
kernbench_2x         5.58283         8.31783

ebizzy_1x            144.96959       147.47995	
ebizzy_2x            -11.52278       -4.52835	
ebizzy_3x            -7.59141        -5.17241	
       	
dbench_1x            87.46935        61.14888	
dbench_2x            -7.16749        -4.17130	
dbench_3x            -0.34115        -3.18740


IMO, 
the original patch would have affceted moderate overcommits more
when we have average runqueue length less than two but we are still 
overcommitted.

With this patch though we may get litlle less improvement for
1x overcommit, probability of this affetcting moderate overcommit
situation reduces drastically.

If we consider cases where, we have average run queue length as follows,
and corresponding probability of we considering it as undercommitted (wrongly)
case with rough/simple math is as follows: (correct me if something is
wrong here)

( In precise math it should be the probability of we hitting
a source and target runqueue length one when we are overcommitted for
two successive trials, noting that source of the yield_to remains same)

(Probability = p^3 where p is the probability that we hit a cpu having
rq length = 1. I have taken out #cpus from calculation here though it affects)

average     Probability 
rq length
---------------------------------
1.25       27/64  Note: we are slightly overcommitted here and hopefully it does not afftect much.
1.5        1/8 
1.75       1/64

for original patch it was p^2 and hence 9/16, 1/4, 1/16 respectively.

I think continuing to yield_to beyond this would make us go back to
square one, unnecessarily wasting time.

Please let me know if you have any comments on idea of bailing out after successive trials?

(Side) Note: Dynamic ple window built on top of this logic is already done. and
will be posting it with results in a separate patch.

Changed hunk:
---8<---
@@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
 	return eligible;
 }
 #endif
+
 void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
 	struct kvm *kvm = me->kvm;
 	struct kvm_vcpu *vcpu;
 	int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
 	int yielded = 0;
+	int try = 2;
 	int pass;
 	int i;
 
@@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 	 * VCPU is holding the lock that we need and will release it.
 	 * We approximate round-robin by starting at the last boosted VCPU.
 	 */
-	for (pass = 0; pass < 2 && !yielded; pass++) {
+	for (pass = 0; pass < 2 && !yielded && try; pass++) {
 		kvm_for_each_vcpu(i, vcpu, kvm) {
 			if (!pass && i <= last_boosted_vcpu) {
 				i = last_boosted_vcpu;
@@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 				continue;
 			if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
 				continue;
-			if (kvm_vcpu_yield_to(vcpu)) {
+
+			yielded = kvm_vcpu_yield_to(vcpu);
+			if (yielded > 0) {
 				kvm->last_boosted_vcpu = i;
-				yielded = 1;
 				break;
+			} else if (yielded < 0) {
+				try--;
+				if (!try)
+					break;
 			}
 		}
 	}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/