[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <505C691D.4080801@hp.com>
Date: Fri, 21 Sep 2012 06:18:21 -0700
From: Chegu Vinod <chegu_vinod@...com>
To: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@...radead.org>,
"H. Peter Anvin" <hpa@...or.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Ingo Molnar <mingo@...hat.com>, Avi Kivity <avi@...hat.com>,
Rik van Riel <riel@...hat.com>,
Srikar <srikar@...ux.vnet.ibm.com>,
"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
LKML <linux-kernel@...r.kernel.org>,
Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>,
Gleb Natapov <gleb@...hat.com>
Subject: Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios
in PLE handler
On 9/21/2012 4:59 AM, Raghavendra K T wrote:
> In some special scenarios like #vcpu <= #pcpu, PLE handler may
> prove very costly,
Yes.
> because there is no need to iterate over vcpus
> and do unsuccessful yield_to burning CPU.
>
> An idea to solve this is:
> 1) As Avi had proposed we can modify hardware ple_window
> dynamically to avoid frequent PL-exit.
Yes. We had to do this to get around some scaling issues for large
(>20way) guests (with no overcommitment)
As part of some experimentation we even tried "switching off" PLE too :(
> (IMHO, it is difficult to
> decide when we have mixed type of VMs).
Agree.
Not sure if the following alternatives have also been looked at :
- Could the behavior associated with the "ple_window" be modified to
be a function of some [new] per-guest attribute (which can be conveyed
to the host as part of the guest launch sequence). The user can choose
to set this [new] attribute for a given guest. This would help avoid the
frequent exits due to PLE (as Avi had mentioned earlier) ?
- Can the PLE feature ( in VT) be "enhanced" to be made a per guest
attribute ?
IMHO, the approach of not taking a frequent exit is better than taking
an exit and returning back from the handler etc.
Thanks
Vinod
>
> Another idea, proposed in the first patch, is to identify
> non-overcommit case and just return from the PLE handler.
>
> There are are many ways to identify non-overcommit scenario.
> 1) Using loadavg etc (get_avenrun/calc_global_load
> /this_cpu_load)
>
> 2) Explicitly check nr_running()/num_online_cpus()
>
> 3) Check source vcpu runqueue length.
>
> Not sure how can we make use of (1) effectively/how to use it.
> (2) has significant overhead since it iterates all cpus.
> so this patch uses third method. (I feel it is uglier to export
> runqueue length, but expecting suggestion on this).
>
> In second patch, when we have large number of small guests, it is
> possible that a spinning vcpu fails to yield_to any vcpu of same
> VM and go back and spin. This is also not effective when we are
> over-committed. Instead, we do a schedule() so that we give chance
> to other VMs to run.
>
> Raghavendra K T(2):
> Handle undercommitted guest case in PLE handler
> Be courteous to other VMs in overcommitted scenario in PLE handler
>
> Results:
> base = 3.6.0-rc5 + ple handler optimization patches from kvm tree.
> patched = base + patch1 + patch2
> machine: x240 with 16 core with HT enabled (32 cpu thread).
> 32 vcpu guest with 8GB RAM.
>
> +-----------+-----------+-----------+------------+-----------+
> ebizzy (record/sec higher is better)
> +-----------+-----------+-----------+------------+-----------+
> base stddev patched stdev %improve
> +-----------+-----------+-----------+------------+-----------+
> 11293.3750 624.4378 18209.6250 371.7061 61.24166
> 3641.8750 468.9400 3725.5000 253.7823 2.29621
> +-----------+-----------+-----------+------------+-----------+
>
> +-----------+-----------+-----------+------------+-----------+
> kernbench (time in sec lower is better)
> +-----------+-----------+-----------+------------+-----------+
> base stddev patched stdev %improve
> +-----------+-----------+-----------+------------+-----------+
> 30.6020 1.3018 30.8287 1.1517 -0.74080
> 64.0825 2.3764 63.4721 5.0191 0.95252
> 95.8638 8.7030 94.5988 8.3832 1.31958
> +-----------+-----------+-----------+------------+-----------+
>
> Note:
> on mx3850x5 machine with 32 cores HT disabled I got around
> ebizzy 209%
> kernbench 6%
> improvement for 1x scenario.
>
> Thanks Srikar for his active partipation in discussing ideas and
> reviewing the patch.
>
> Please let me know your suggestions and comments.
> ---
> include/linux/sched.h | 1 +
> kernel/sched/core.c | 6 ++++++
> virt/kvm/kvm_main.c | 7 +++++++
> 3 files changed, 14 insertions(+), 0 deletions(-)
>
> .
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists