lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 23 Sep 2016 12:58:08 +0100
From:   Matt Fleming <matt@...eblueprint.co.uk>
To:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>
Cc:     linux-kernel@...r.kernel.org,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Mike Galbraith <umgwanakikbuti@...il.com>,
        Yuyang Du <yuyang.du@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>
Subject: [PATCH] sched/fair: Do not decay new task load on first enqueue

Since commit 7dc603c9028e ("sched/fair: Fix PELT integrity for new
tasks") ::last_update_time will be set to a non-zero value in
post_init_entity_util_avg(), which leads to p->se.avg.load_avg being
decayed on enqueue before the task has even had a chance to run.

For a NICE_0 task the sequence of events leading up to this with
example load average changes might be,

  sched_fork()
    init_entity_runnable_average()
      p->se.avg.load_avg = scale_load_down(se->load.weight);	// 1024

  wake_up_new_task()
    post_init_entity_util_avg()
      attach_entity_load_avg()
        p->se.last_update_time = cfs_rq->avg.last_update_time;

    activate_task()
      enqueue_task()
        ...
          enqueue_entity_load_avg()
            migrated = !sa->last_update_time			// false
            if (!migrated)
                    __update_load_avg()
                      p->se.avg.load_avg = 1002

This causes a performance regression for fork intensive workloads like
hackbench. When balancing on fork we can end up picking the same CPU
to enqueue on over and over. This leads to huge congestion when trying
to simultaneously wake up tasks that are all on the same runqueue, and
causes lots of migrations on wake up.

The behaviour since commit 7dc603c9028e essentially defeats the
scheduler's attempt to balance on fork(). Before, ::runnable_load_avg
likely had a non-zero value when the hackbench tasks were dequeued
(the fork()'d tasks immediately block reading on pipe/socket) but now
the load balancer sees the CPU as having no runnable load.

Arguably the real problem is that balancing on fork doesn't look at
the blocked contribution of tasks, only the runnable load and it's
possible for the two metrics to be wildly different on a relatively
idle system.

But it still doesn't seem quite right to update a task's load_avg
before it runs for the first time.

Here are the results of running hackbench before 7dc603c9028e (old
behaviour), with 7dc603c9028e applied (exiting behaviour), and after
7dc603c9028e with this patch on top (new behaviour),

hackbench-process-sockets

                         4.7.0-rc5             4.7.0-rc5             4.7.0-rc5
                            before          7dc603c9028e                 after
Amean    1        0.0611 (  0.00%)      0.0693 (-13.32%)      0.0600 (  1.87%)
Amean    4        0.1777 (  0.00%)      0.1730 (  2.65%)      0.1790 ( -0.72%)
Amean    7        0.2771 (  0.00%)      0.2816 ( -1.60%)      0.2741 (  1.08%)
Amean    12       0.3851 (  0.00%)      0.4167 ( -8.20%)      0.3751 (  2.60%)

Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>
Cc: Mike Galbraith <umgwanakikbuti@...il.com>
Cc: Yuyang Du <yuyang.du@...el.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Signed-off-by: Matt Fleming <matt@...eblueprint.co.uk>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8fb4d1942c14..4a2d3ff772f8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	int migrated, decayed;
 
 	migrated = !sa->last_update_time;
-	if (!migrated) {
+	if (!migrated && se->sum_exec_runtime) {
 		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
 			se->on_rq * scale_load_down(se->load.weight),
 			cfs_rq->curr == se, NULL);
-- 
2.10.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ