[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20180720131555.GN2476@hirez.programming.kicks-ass.net>
Date: Fri, 20 Jul 2018 15:15:55 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Xiexiangyou <xiexiangyou@...wei.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"pjt@...gle.com" <pjt@...gle.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"efault@....de" <efault@....de>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"Huangweidong (C)" <weidong.huang@...wei.com>,
"weiqi (C)" <weiqi4@...wei.com>, longpeng <longpeng2@...wei.com>
Subject: Re: [PATCH] sched/fair: cfs quota cause large schedule latency
On Mon, Jul 16, 2018 at 07:08:41AM +0000, Xiexiangyou wrote:
> Virtual machine has cgroup hierarchies as follow:
>
> root
>
> |
>
> vm_tg
>
> (cfs_rq)
>
> / \
>
> (se) (se)
>
> tg_A tg_B
>
> (cfs_rq) (cfs_rq)
>
> / \
>
> (se) (se)
>
> a b
>
> A and B are two vcpus of the VM.
>
>
>
> We set cfs quota on vm_tg, and the schedule latency of vcpu(a/b) may become very large, up to more than 2S.
>
>
>
> Shows Perf sched test result:
>
> Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |
>
> -----------------------------------------------------------------------------------------------------------------
>
> CPU 0/KVM:49609 | 260.261 ms | 50 | avg: 82.017 ms | max: 2510.990 ms | max at: 43335.555886 s
>
> .....
>
>
>
> We add some trace points, found the sequence as follows will lead to the issue:
>
> - 'a' is only task of tg_A, when 'a' go to sleep, tg_A is dequeued, and tg_A->se->load.weight = MIN_SHARES.
>
> - 'b' continue running, then trigger throttle. tg_A->cfs_rq->throttle_count=1
>
> - some task wakeup process 'a', When enqueue tg_A, tg_A->se->load.weight can't be updated because tg_A->cfs_rq->throttle_count=1
>
> - after one cfs quota period, vm_tg is unthrottled
>
> - 'a' is running
>
> - after one tick, when update tg_A->se's vruntime, tg_A->se->load.weight is still MIN_SHARES, lead tg_A->se's vruntime has grown a large value.
>
> - That will cause 'a' to have a large schedule latancy.
>
> The fix patch as follows:
>
> Signed-off-by: Xiangyou Xie <xiexiangyou@...wei.com<mailto:xiexiangyou@...wei.com>>
The above Changelog violates just about every formatting rule ever
invented. Also you got your email format wrong.
The patch might be OK, but at this point I really can't do anything with
it anyway.
> ---
> kernel/sched/fair.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2f0a0be..348ccd6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3016,9 +3016,6 @@ static void update_cfs_group(struct sched_entity *se)
> if (!gcfs_rq)
> return;
>
> - if (throttled_hierarchy(gcfs_rq))
> - return;
> -
> #ifndef CONFIG_SMP
> runnable = shares = READ_ONCE(gcfs_rq->tg->shares);
>
> --
> 1.8.3.1
>
Powered by blists - more mailing lists