lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 May 2017 22:13:35 +0200
From:   luca abeni <luca.abeni@...tannapisa.it>
To:     linux-kernel@...r.kernel.org
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@....com>,
        Claudio Scordino <claudio@...dence.eu.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tommaso Cucinotta <tommaso.cucinotta@...up.it>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Mathieu Poirier <mathieu.poirier@...aro.org>,
        Luca Abeni <luca.abeni@...tannapisa.it>
Subject: [PATCH 08/10] sched/deadline: base GRUB reclaiming on the inactive utilization

From: Luca Abeni <luca.abeni@...tannapisa.it>

Instead of decreasing the runtime as "dq = -Uact dt" (eventually
divided by the maximum utilization available for deadline tasks),
decrease it as "dq = -max{u, (1 - Uinact)} dt", where u is the task
utilization and Uinact is the "inactive utilization".
In this way, the maximum fraction of CPU time that can be reclaimed
is given by the total utilization of deadline tasks.
This approach solves a fairness issue with "traditional" global GRUB
reclaiming: using the traditional GRUB algorithm, if tasks are
allocated to the various cores in a non-uniform way, the
reclaiming mechanism allows some tasks to reclaim more time than
others. This issue is visible starting 11 time-consuming tasks with
runtime 10ms and period 30ms (total utilization 3.666) on a 4-cores
system: some tasks will receive much more than the reserved runtime
(thanks to the reclaiming mechanism), while other tasks will receive
less than the reserved runtime.

Signed-off-by: Luca Abeni <luca.abeni@...tannapisa.it>
Tested-by: Daniel Bristot de Oliveira <bristot@...hat.com>
---
 kernel/sched/deadline.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c224c0dd..4286bb8 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -955,26 +955,30 @@ extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq);
 /*
  * This function implements the GRUB accounting rule:
  * according to the GRUB reclaiming algorithm, the runtime is
- * not decreased as "dq = -dt", but as "dq = -Uact dt", where
- * Uact is the (per-runqueue) active utilization.
- * Since rq->dl.running_bw contains Uact * 2^BW_SHIFT, the result
- * has to be shifted right by BW_SHIFT.
- * To reclaim only a fraction Umax of the CPU time, the
- * runtime accounting rule is modified as
- * "dq = -Uact / Umax dt"; since rq->dl.bw_ratio contains
- * 2^RATIO_SHIFT / Umax, delta is multiplied by bw_ratio and shifted
- * right by RATIO_SHIFT.
- * Since delta is a 64 bit variable, to have an overflow its value
- * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
- * So, overflow is not an issue here.
+ * not decreased as "dq = -dt", but as "dq = -max{u, (1 - Uinact)} dt",
+ * where u is the utilization of the task and Uinact is the
+ * (per-runqueue) inactive utilization, computed as the difference
+ * between the "total runqueue utilization" and the runqueue
+ * active utilization.
+ * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
+ * multiplied by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
  */
-u64 grub_reclaim(u64 delta, struct rq *rq)
+u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
 {
-	delta *= rq->dl.running_bw;
-	delta *= rq->dl.bw_ratio;
-	delta >>= BW_SHIFT + RATIO_SHIFT;
+	u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
+	u64 u_act;
 
-	return delta;
+	/*
+	 * Instead of computing max{u, (1 - u_inact)}, we compare
+	 * u_inact with 1 - u, because u_inact can be larger than 1
+	 * (so, 1 - u_inact would be negative leading to wrong results)
+	 */
+	if (u_inact > BW_UNIT - dl_se->dl_bw)
+		u_act = dl_se->dl_bw;
+	else
+		u_act = BW_UNIT - u_inact;
+
+	return (delta * u_act) >> BW_SHIFT;
 }
 
 /*
@@ -1020,7 +1024,7 @@ static void update_curr_dl(struct rq *rq)
 	sched_rt_avg_update(rq, delta_exec);
 
 	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
-		delta_exec = grub_reclaim(delta_exec, rq);
+		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
 	dl_se->runtime -= delta_exec;
 
 throttle:
-- 
2.7.4

Powered by blists - more mailing lists