linux-kernel - Re: [PATCH] sched/fair: cfs quota cause large schedule latency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20180720131555.GN2476@hirez.programming.kicks-ass.net>
Date:   Fri, 20 Jul 2018 15:15:55 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Xiexiangyou <xiexiangyou@...wei.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "pjt@...gle.com" <pjt@...gle.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "efault@....de" <efault@....de>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "Huangweidong (C)" <weidong.huang@...wei.com>,
        "weiqi (C)" <weiqi4@...wei.com>, longpeng <longpeng2@...wei.com>
Subject: Re: [PATCH] sched/fair: cfs quota cause large schedule latency

On Mon, Jul 16, 2018 at 07:08:41AM +0000, Xiexiangyou wrote:
> Virtual machine has cgroup hierarchies as follow:
> 
>                root
> 
>                 |
> 
>               vm_tg
> 
>               (cfs_rq)
> 
>               /    \
> 
>             (se)    (se)
> 
>             tg_A    tg_B
> 
>           (cfs_rq)    (cfs_rq)
> 
>             /          \
> 
>           (se)          (se)
> 
>           a              b
> 
> A and B are two vcpus of the VM.
> 
> 
> 
> We set cfs quota on vm_tg, and the schedule latency of vcpu(a/b) may become very large, up to more than 2S.
> 
> 
> 
> Shows Perf sched test result:
> 
> Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
> 
> -----------------------------------------------------------------------------------------------------------------
> 
>   CPU 0/KVM:49609       |    260.261 ms |       50 | avg:   82.017 ms | max: 2510.990 ms | max at:  43335.555886 s
> 
>   .....
> 
> 
> 
> We add some trace points, found the sequence as follows will lead to the issue:
> 
> -          'a' is only task of tg_A, when 'a' go to sleep, tg_A is dequeued, and tg_A->se->load.weight = MIN_SHARES.
> 
> -          'b' continue running, then trigger throttle. tg_A->cfs_rq->throttle_count=1
> 
> -          some task wakeup process 'a', When enqueue tg_A, tg_A->se->load.weight can't be updated because tg_A->cfs_rq->throttle_count=1
> 
> -          after one cfs quota period, vm_tg is unthrottled
> 
> -          'a' is running
> 
> -          after one tick, when update tg_A->se's vruntime, tg_A->se->load.weight is still MIN_SHARES, lead tg_A->se's vruntime has grown a large value.
> 
> -          That will cause 'a' to have a large schedule latancy.
> 
> The fix patch as follows:
> 
> Signed-off-by: Xiangyou Xie <xiexiangyou@...wei.com<mailto:xiexiangyou@...wei.com>>

The above Changelog violates just about every formatting rule ever
invented. Also you got your email format wrong.

The patch might be OK, but at this point I really can't do anything with
it anyway.

> ---
> kernel/sched/fair.c | 3 ---
> 1 file changed, 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2f0a0be..348ccd6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3016,9 +3016,6 @@ static void update_cfs_group(struct sched_entity *se)
>         if (!gcfs_rq)
>                 return;
> 
> -       if (throttled_hierarchy(gcfs_rq))
> -               return;
> -
> #ifndef CONFIG_SMP
>         runnable = shares = READ_ONCE(gcfs_rq->tg->shares);
> 
> --
> 1.8.3.1
>