[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121126134301.GB9830@turtle.usersys.redhat.com>
Date: Mon, 26 Nov 2012 14:43:02 +0100
From: Andrew Jones <drjones@...hat.com>
To: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
riel@...hat.com
Cc: Peter Zijlstra <peterz@...radead.org>,
"H. Peter Anvin" <hpa@...or.com>, Avi Kivity <avi@...hat.com>,
Gleb Natapov <gleb@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Rik van Riel <riel@...hat.com>,
Srikar <srikar@...ux.vnet.ibm.com>,
"Nikunj A. Dadhania" <nikunj@...ux.vnet.ibm.com>,
KVM <kvm@...r.kernel.org>, Jiannan Ouyang <ouyang@...pitt.edu>,
Chegu Vinod <chegu_vinod@...com>,
"Andrew M. Theurer" <habanero@...ux.vnet.ibm.com>,
LKML <linux-kernel@...r.kernel.org>,
Srivatsa Vaddagiri <srivatsa.vaddagiri@...il.com>
Subject: Re: [PATCH V3 RFC 2/2] kvm: Handle yield_to failure return code for
potential undercommit case
On Mon, Nov 26, 2012 at 05:38:04PM +0530, Raghavendra K T wrote:
> From: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
>
> yield_to returns -ESRCH, When source and target of yield_to
> run queue length is one. When we see three successive failures of
> yield_to we assume we are in potential undercommit case and abort
> from PLE handler.
> The assumption is backed by low probability of wrong decision
> for even worst case scenarios such as average runqueue length
> between 1 and 2.
>
> note that we do not update last boosted vcpu in failure cases.
> Thank Avi for raising question on aborting after first fail from yield_to.
>
> Reviewed-by: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
> ---
> virt/kvm/kvm_main.c | 26 ++++++++++++++++----------
> 1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index be70035..053f494 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
> {
> struct pid *pid;
> struct task_struct *task = NULL;
> + bool ret = false;
>
> rcu_read_lock();
> pid = rcu_dereference(target->pid);
> @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
> task = get_pid_task(target->pid, PIDTYPE_PID);
> rcu_read_unlock();
> if (!task)
> - return false;
> + return ret;
> if (task->flags & PF_VCPU) {
> put_task_struct(task);
> - return false;
> - }
> - if (yield_to(task, 1)) {
> - put_task_struct(task);
> - return true;
> + return ret;
> }
> + ret = yield_to(task, 1);
> put_task_struct(task);
> - return false;
> +
> + return ret;
> }
> EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
>
> @@ -1697,12 +1696,14 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu)
> return eligible;
> }
> #endif
> +
> void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> {
> struct kvm *kvm = me->kvm;
> struct kvm_vcpu *vcpu;
> int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
> int yielded = 0;
> + int try = 3;
> int pass;
> int i;
>
> @@ -1714,7 +1715,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> * VCPU is holding the lock that we need and will release it.
> * We approximate round-robin by starting at the last boosted VCPU.
> */
> - for (pass = 0; pass < 2 && !yielded; pass++) {
> + for (pass = 0; pass < 2 && !yielded && try; pass++) {
> kvm_for_each_vcpu(i, vcpu, kvm) {
> if (!pass && i <= last_boosted_vcpu) {
> i = last_boosted_vcpu;
> @@ -1727,10 +1728,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
> continue;
> if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
> continue;
> - if (kvm_vcpu_yield_to(vcpu)) {
> +
> + yielded = kvm_vcpu_yield_to(vcpu);
> + if (yielded > 0) {
> kvm->last_boosted_vcpu = i;
> - yielded = 1;
> break;
> + } else if (yielded < 0) {
> + try--;
> + if (!try)
> + break;
> }
> }
> }
>
The check done in patch 1/2 is done before the double_rq_lock, so it's
cheap. Now, this patch is to avoid doing too many get_pid_task calls. I
wonder if it would make more sense to change the vcpu state from tracking
the pid to tracking the task. If that was done, then I don't believe this
patch is necessary.
Rik,
for 34bb10b79de7 was there a reason pid was used instead of task?
Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists