[<prev] [next>] [day] [month] [year] [list]
Message-ID: <5B4D951D.9050504@huawei.com>
Date: Tue, 17 Jul 2018 15:05:01 +0800
From: "Longpeng (Mike)" <longpeng2@...wei.com>
To: <peterz@...radead.org>, <pjt@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
kvm <kvm@...r.kernel.org>
CC: Wanpeng Li <kernellwp@...il.com>,
Xiexiangyou <xiexiangyou@...wei.com>,
"Huangweidong (C)" <weidong.huang@...wei.com>,
Gonglei <arei.gonglei@...wei.com>,
"weiqi (C)" <weiqi4@...wei.com>
Subject: [ RFC ] Set quota on VM cause large schedule latency of vcpu
Virtual machine has cgroup hierarchies as follow:
root
|
vm_tg
(cfs_rq)
/ \
(se) (se)
tg_A tg_B
(cfs_rq) (cfs_rq)
/ \
(se) (se)
a b
'a' and 'b' are two vcpus of the VM.
We set cfs quota on vm_tg, and the schedule latency of vcpu(a/b) may become very
large, up to more than 2S.
We use perf sched to capture the latency ( perf sched record -a sleep 10;
perf sched lat -p --sort=max ) and the result is as follow:
Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |
------------------------------------------------------------------------
CPU 0/KVM| 260.261 ms | 50 | avg: 82.017 ms | max: 2510.990 ms |
...
We test the latest kernel and the result is the same.
We add some tracepoints, found the following sequence will cause the issue:
1) 'a' is only task of tg_A, when 'a' go to sleep (e.g. vcpu halt), tg_A is
dequeued, and tg_A->se->load.weight = MIN_SHARES.
2) 'b' continue running, then trigger throttle. tg_A->cfs_rq->throttle_count=1
3) Something wakeup 'a' (e.g. vcpu receive a virq). When enqueue tg_A,
tg_A->se->load.weight can't be updated because tg_A->cfs_rq->throttle_count=1
4) After one cfs quota period, vm_tg is unthrottled
5) 'a' is running
6) After one tick, when update tg_A->se's vruntime, tg_A->se->load.weight is
still MIN_SHARES, lead tg_A->se's vruntime has grown a large value.
7) That will cause 'a' to have a large schedule latency.
We *rudely* remove the check which cause tg_A->se->load.weight didn't reweight
in step-3 as follow and the problem disappear:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f0a0be..348ccd6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3016,9 +3016,6 @@ static void update_cfs_group(struct sched_entity *se)
if (!gcfs_rq)
return;
- if (throttled_hierarchy(gcfs_rq))
- return;
-
#ifndef CONFIG_SMP
runnable = shares = READ_ONCE(gcfs_rq->tg->shares);
So do guys you have any suggestion on this problem ? Is there a better way fix
this problem ?
--
Regards,
Longpeng(Mike)
Powered by blists - more mailing lists