linux-kernel - Re: [PATCH V3 RESEND RFC 0/2] kvm: Improving undercommit scenarios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130123135707.GA5817@hawk.usersys.redhat.com>
Date:	Wed, 23 Jan 2013 14:57:08 +0100
From:	Andrew Jones <drjones@...hat.com>
To:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Avi Kivity <avi.kivity@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Gleb Natapov <gleb@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Srikar <srikar@...ux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
	Chegu Vinod <chegu_vinod@...com>,
	"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>
Subject: Re: [PATCH V3 RESEND RFC 0/2] kvm: Improving undercommit scenarios

On Tue, Jan 22, 2013 at 01:08:54PM +0530, Raghavendra K T wrote:
>  In some special scenarios like #vcpu <= #pcpu, PLE handler may
> prove very costly, because there is no need to iterate over vcpus
> and do unsuccessful yield_to burning CPU.
> 
>  The first patch optimizes all the yield_to by bailing out when there
>  is no need to continue in yield_to (i.e., when there is only one task 
>  in source and target rq).
> 
>  Second patch uses that in PLE handler. Further when a yield_to fails
>  we do not immediately go out of PLE handler instead we try thrice 
>  to have better statistical possibility of false return. Otherwise that
>  would affect moderate overcommit cases.
>  
>  Result on 3.7.0-rc6 kernel shows around 140% improvement for ebizzy 1x and
>  around 51% for dbench 1x  with 32 core PLE machine with 32 vcpu guest.
> 
> 
> base = 3.7.0-rc6 
> machine: 32 core mx3850 x5 PLE mc
> 
> --+-----------+-----------+-----------+------------+-----------+
>                ebizzy (rec/sec higher is beter)
> --+-----------+-----------+-----------+------------+-----------+
>     base        stdev       patched     stdev       %improve     
> --+-----------+-----------+-----------+------------+-----------+
> 1x   2511.3000    21.5409    6051.8000   170.2592   140.98276   
> 2x   2679.4000   332.4482    2692.3000   251.4005     0.48145
> 3x   2253.5000   266.4243    2192.1667   178.9753    -2.72169
> --+-----------+-----------+-----------+------------+-----------+
> 
> --+-----------+-----------+-----------+------------+-----------+
>         dbench (throughput in MB/sec. higher is better)
> --+-----------+-----------+-----------+------------+-----------+
>     base        stdev       patched     stdev       %improve     
> --+-----------+-----------+-----------+------------+-----------+
> 1x  6677.4080   638.5048    10098.0060   3449.7026     51.22643
> 2x  2012.6760    64.7642    2019.0440     62.6702       0.31639
> 3x  1302.0783    40.8336    1292.7517     27.0515      -0.71629
> --+-----------+-----------+-----------+------------+-----------+
> 
> Here is the refernce of no ple result.
>  ebizzy-1x_nople 7592.6000 rec/sec
>  dbench_1x_nople 7853.6960 MB/sec

I'm not sure how much we should trust ebizzy results, but even
so, the dbench results are stranger. The percent error is huge
(34%) and somehow we do much better for 1x overcommit with PLE
enabled then without (for the patched version). How does that
happen? How many guests are running in the 1x test? And are the
throughput results the combined throughput of all of them? I
wonder if this jump in throughput is just the guests' perceived
throughput, but wrong due to bad virtual time keeping. Can we
run a long-lasting benchmark and measure the elapsed time with
a clock external from the guests?

Drew

> 
> The result says we can still improve by 60% for ebizzy, but overall we are
> getting impressive performance with the patches.
> 
>  Changes Since V2:
>  - Dropped global measures usage patch (Peter Zilstra)
>  - Do not bail out on first failure (Avi Kivity)
>  - Try thrice for the failure of yield_to to get statistically more correct
>    behaviour.
> 
>  Changes since V1:
>  - Discard the idea of exporting nrrunning and optimize in core scheduler (Peter)
>  - Use yield() instead of schedule in overcommit scenarios (Rik)
>  - Use loadavg knowledge to detect undercommit/overcommit
> 
>  Peter Zijlstra (1):
>   Bail out of yield_to when source and target runqueue has one task
> 
>  Raghavendra K T (1):
>   Handle yield_to failure return for potential undercommit case
> 
>  Please let me know your comments and suggestions.
> 
>  Link for the discussion of V3 original:
>  https://lkml.org/lkml/2012/11/26/166
> 
>  Link for V2:
>  https://lkml.org/lkml/2012/10/29/287
> 
>  Link for V1:
>  https://lkml.org/lkml/2012/9/21/168
> 
>  kernel/sched/core.c | 25 +++++++++++++++++++------
>  virt/kvm/kvm_main.c | 26 ++++++++++++++++----------
>  2 files changed, 35 insertions(+), 16 deletions(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/