linux-kernel - Re: [PATCH] sched/fair: Do not decay new task load on first enqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtCi_ekH0ENU+oUJsQka2XagvY=gk=RDZRfpLFWypKrroQ@mail.gmail.com>
Date:   Fri, 23 Sep 2016 16:30:25 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Matt Fleming <matt@...eblueprint.co.uk>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Mike Galbraith <umgwanakikbuti@...il.com>,
        Yuyang Du <yuyang.du@...el.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>
Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue

Hi Matt,

On 23 September 2016 at 13:58, Matt Fleming <matt@...eblueprint.co.uk> wrote:
> Since commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new
> tasks") ::last_update_time will be set to a non-zero value in
> post_init_entity_util_avg(), which leads to p->se.avg.load_avg being
> decayed on enqueue before the task has even had a chance to run.
>
> For a NICE_0 task the sequence of events leading up to this with
> example load average changes might be,
>
>   sched_fork()
>     init_entity_runnable_average()
>       p->se.avg.load_avg = scale_load_down(se->load.weight);    // 1024
>
>   wake_up_new_task()
>     post_init_entity_util_avg()
>       attach_entity_load_avg()
>         p->se.last_update_time = cfs_rq->avg.last_update_time;
>
>     activate_task()
>       enqueue_task()
>         ...
>           enqueue_entity_load_avg()
>             migrated = !sa->last_update_time                    // false
>             if (!migrated)
>                     __update_load_avg()
>                       p->se.avg.load_avg = 1002

Does it mean that you can see the perf drop that you mention below
because load is decayed to 1002 instead of staying to 1024 ?

1002 mainly comes from period_contrib being set to 1023 during
init_entity_runnable_average so any delay longer than 1us between
attach_entity_load_avg and enqueue_entity_load_avg will trig the decay
of the load from 1024 to 1002

>
> This causes a performance regression for fork intensive workloads like
> hackbench. When balancing on fork we can end up picking the same CPU
> to enqueue on over and over. This leads to huge congestion when trying
> to simultaneously wake up tasks that are all on the same runqueue, and
> causes lots of migrations on wake up.
>
> The behaviour since commit 7dc603c9028e essentially defeats the
> scheduler's attempt to balance on fork(). Before, ::runnable_load_avg
> likely had a non-zero value when the hackbench tasks were dequeued
> (the fork()'d tasks immediately block reading on pipe/socket) but now
> the load balancer sees the CPU as having no runnable load.

But this patch doesn't change the behavior of runnable_load_avg, isn't
it ? it has only an impact on the initial value of p->se.avg.load_avg
when the task is enqueued.

>
> Arguably the real problem is that balancing on fork doesn't look at
> the blocked contribution of tasks, only the runnable load and it's
> possible for the two metrics to be wildly different on a relatively
> idle system.

fair enough

>
> But it still doesn't seem quite right to update a task's load_avg
> before it runs for the first time.
>
> Here are the results of running hackbench before 7dc603c9028e (old
> behaviour), with 7dc603c9028e applied (exiting behaviour), and after
> 7dc603c9028e with this patch on top (new behaviour),
>
> hackbench-process-sockets
>
>                          4.7.0-rc5             4.7.0-rc5             4.7.0-rc5
>                             before          7dc603c9028e                 after
> Amean    1        0.0611 (  0.00%)      0.0693 (-13.32%)      0.0600 (  1.87%)
> Amean    4        0.1777 (  0.00%)      0.1730 (  2.65%)      0.1790 ( -0.72%)
> Amean    7        0.2771 (  0.00%)      0.2816 ( -1.60%)      0.2741 (  1.08%)
> Amean    12       0.3851 (  0.00%)      0.4167 ( -8.20%)      0.3751 (  2.60%)
>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Ingo Molnar <mingo@...nel.org>
> Cc: Mike Galbraith <umgwanakikbuti@...il.com>
> Cc: Yuyang Du <yuyang.du@...el.com>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@....com>
> Signed-off-by: Matt Fleming <matt@...eblueprint.co.uk>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8fb4d1942c14..4a2d3ff772f8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
>         int migrated, decayed;
>
>         migrated = !sa->last_update_time;
> -       if (!migrated) {
> +       if (!migrated && se->sum_exec_runtime) {
>                 __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
>                         se->on_rq * scale_load_down(se->load.weight),
>                         cfs_rq->curr == se, NULL);
> --
> 2.10.0
>