linux-kernel - [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20190620150555.15717-1-patrick.bellasi@arm.com>
Date:   Thu, 20 Jun 2019 16:05:55 +0100
From:   Patrick Bellasi <patrick.bellasi@....com>
To:     linux-kernel@...r.kernel.org
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Douglas Raillard <douglas.raillard@....com>,
        Quentin Perret <quentin.perret@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Juri Lelli <juri.lelli@...hat.com>
Subject: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases

The estimated utilization for a task is currently defined based on:
 - enqueued: the utilization value at the end of the last activation
 - ewma:     an exponential moving average which samples are the enqueued values

According to this definition, when a task suddenly change it's bandwidth
requirements from small to big, the EWMA will need to collect multiple
samples before converging up to track the new big utilization.

Moreover, after the PELT scale invariance update [1], in the above scenario we
can see that the utilization of the task has a significant drop from the first
big activation to the following one. That's implied by the new "time-scaling"
mechanisms instead of the previous "delta-scaling" approach.

Unfortunately, these drops cannot be fully absorbed by the current util_est
implementation. Indeed, the low-frequency filtering introduced by the "ewma" is
entirely useless while converging up and it does not help in stabilizing sooner
the PELT signal.

To make util_est do better service in the above scenario, do change its
definition to slow down only utilization decreases. Do that by resetting the
"ewma" every time the last collected sample increases.

This change makes also the default util_est implementation more aligned with
the major scheduler behavior, which is to optimize for performance.
In the future, this implementation can be further refined to consider
task specific hints.

[1] sched/fair: Update scale invariance of PELT
    Message-ID: <tip-23127296889fe84b0762b191b5d041e8ba6f2599@....kernel.org>

Signed-off-by: Patrick Bellasi <patrick.bellasi@....com>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
---
 kernel/sched/fair.c     | 14 +++++++++++++-
 kernel/sched/features.h |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c11dcdedcbc..27b33caaaaf4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3685,11 +3685,22 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
 	if (ue.enqueued & UTIL_AVG_UNCHANGED)
 		return;
 
+	/*
+	 * Reset EWMA on utilization increases, the moving average is used only
+	 * to smooth utilization decreases.
+	 */
+	ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED);
+	if (sched_feat(UTIL_EST_FASTUP)) {
+		if (ue.ewma < ue.enqueued) {
+			ue.ewma = ue.enqueued;
+			goto done;
+		}
+	}
+
 	/*
 	 * Skip update of task's estimated utilization when its EWMA is
 	 * already ~1% close to its last activation value.
 	 */
-	ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED);
 	last_ewma_diff = ue.enqueued - ue.ewma;
 	if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
 		return;
@@ -3722,6 +3733,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
 	ue.ewma <<= UTIL_EST_WEIGHT_SHIFT;
 	ue.ewma  += last_ewma_diff;
 	ue.ewma >>= UTIL_EST_WEIGHT_SHIFT;
+done:
 	WRITE_ONCE(p->se.avg.util_est, ue);
 }
 
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 2410db5e9a35..7481cd96f391 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -89,3 +89,4 @@ SCHED_FEAT(WA_BIAS, true)
  * UtilEstimation. Use estimated CPU utilization.
  */
 SCHED_FEAT(UTIL_EST, true)
+SCHED_FEAT(UTIL_EST_FASTUP, true)
-- 
2.21.0