linux-kernel - Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <505C691D.4080801@hp.com>
Date:	Fri, 21 Sep 2012 06:18:21 -0700
From:	Chegu Vinod <chegu_vinod@...com>
To:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Ingo Molnar <mingo@...hat.com>, Avi Kivity <avi@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Srikar <srikar@...ux.vnet.ibm.com>,
	"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
	KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
	"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>,
	Gleb Natapov <gleb@...hat.com>
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios
 in PLE handler

On 9/21/2012 4:59 AM, Raghavendra K T wrote:
> In some special scenarios like #vcpu <= #pcpu, PLE handler may
> prove very costly,

Yes.
>   because there is no need to iterate over vcpus
> and do unsuccessful yield_to burning CPU.
>
> An idea to solve this is:
> 1) As Avi had proposed we can modify hardware ple_window
> dynamically to avoid frequent PL-exit.

Yes. We had to do this to get around some scaling issues for large 
(>20way) guests (with no overcommitment)

As part of some experimentation we even tried "switching off"  PLE too :(



> (IMHO, it is difficult to
> decide when we have mixed type of VMs).

Agree.

Not sure if the following alternatives have also been looked at :

- Could the  behavior  associated with the "ple_window" be modified to 
be a function of some [new] per-guest attribute (which can be conveyed 
to the host as part of the guest launch sequence). The user can choose 
to set this [new] attribute for a given guest. This would help avoid the 
frequent exits due to PLE (as Avi had mentioned earlier) ?

- Can the PLE feature ( in VT) be "enhanced" to be made a per guest 
attribute ?


IMHO, the approach of not taking a frequent exit is better than taking 
an exit and returning back from the handler etc.

Thanks
Vinod




>
> Another idea, proposed in the first patch, is to identify
> non-overcommit case and just return from the PLE handler.
>
> There are are many ways to identify non-overcommit scenario.
> 1) Using loadavg etc (get_avenrun/calc_global_load
>   /this_cpu_load)
>
> 2) Explicitly check nr_running()/num_online_cpus()
>
> 3) Check source vcpu runqueue length.
>
> Not sure how can we make use of (1) effectively/how to use it.
> (2) has significant overhead since it iterates all cpus.
> so this patch uses third method. (I feel it is uglier to export
> runqueue length, but expecting suggestion on this).
>
> In second patch, when we have large number of small guests, it is
> possible that a spinning vcpu fails to yield_to any vcpu of same
> VM and go back and spin. This is also not effective when we are
> over-committed. Instead, we do a schedule() so that we give chance
> to other VMs to run.
>
> Raghavendra K T(2):
>   Handle undercommitted guest case in PLE handler
>   Be courteous to other VMs in overcommitted scenario in PLE handler
>
> Results:
> base = 3.6.0-rc5 + ple handler optimization patches from kvm tree.
> patched = base + patch1 + patch2
> machine: x240 with 16 core with HT enabled (32 cpu thread).
> 32 vcpu guest with 8GB RAM.
>
> +-----------+-----------+-----------+------------+-----------+
>           ebizzy (record/sec higher is better)
> +-----------+-----------+-----------+------------+-----------+
>     base        stddev       patched    stdev        %improve
> +-----------+-----------+-----------+------------+-----------+
>   11293.3750   624.4378	 18209.6250   371.7061	  61.24166
>    3641.8750   468.9400	  3725.5000   253.7823	   2.29621
> +-----------+-----------+-----------+------------+-----------+
>
> +-----------+-----------+-----------+------------+-----------+
>          kernbench (time in sec lower is better)
> +-----------+-----------+-----------+------------+-----------+
>     base        stddev       patched    stdev        %improve
> +-----------+-----------+-----------+------------+-----------+
>      30.6020     1.3018	    30.8287     1.1517	  -0.74080
>      64.0825     2.3764	    63.4721     5.0191	   0.95252
>      95.8638     8.7030	    94.5988     8.3832	   1.31958
> +-----------+-----------+-----------+------------+-----------+
>
> Note:
> on mx3850x5 machine with 32 cores HT disabled I got around
> ebizzy      209%
> kernbench   6%
> improvement for 1x scenario.
>
> Thanks Srikar for his active partipation in discussing ideas and
> reviewing the patch.
>
> Please let me know your suggestions and comments.
> ---
>   include/linux/sched.h |    1 +
>   kernel/sched/core.c   |    6 ++++++
>   virt/kvm/kvm_main.c   |    7 +++++++
>   3 files changed, 14 insertions(+), 0 deletions(-)
>
> .
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/