[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230515025716.316888-3-vineeth@bitbyteword.org>
Date: Sun, 14 May 2023 22:57:13 -0400
From: Vineeth Pillai <vineeth@...byteword.org>
To: luca.abeni@...tannapisa.it, Juri Lelli <juri.lelli@...hat.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Steven Rostedt <rostedt@...dmis.org>,
Joel Fernandes <joel@...lfernandes.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>
Cc: Vineeth Pillai <vineeth@...byteword.org>,
Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org
Subject: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP
In a multi-processor system, bandwidth usage is divided equally to
all cpus. This causes issues with reclaiming free bandwidth on a cpu.
"Uextra" is same on all cpus in a root domain and running_bw would be
different based on the reserved bandwidth of tasks running on the cpu.
This causes disproportionate reclaiming - task with lesser bandwidth
reclaims less even if its the only task running on that cpu.
Following is a small test with three tasks with reservations (8,10)
(1,10) and (1, 100). These three tasks run on different cpus. But
since the reclamation logic calculates available bandwidth as a factor
of globally available bandwidth, tasks with lesser bandwidth reclaims
only little compared to higher bandwidth even if cpu has free and
available bandwidth to be reclaimed.
TID[730]: RECLAIM=1, (r=8ms, d=10ms, p=10ms), Util: 95.05
TID[731]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 31.34
TID[732]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 3.16
Fix: use the available bandwidth on each cpu to calculate reclaimable
bandwidth. Admission control takes care of total bandwidth and hence
using the available bandwidth on a specific cpu would not break the
deadline guarentees.
With this fix, the above test behaves as follows:
TID[586]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.24
TID[585]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 95.01
TID[584]: RECLAIM=1, (r=8ms, d=10ms, p=10ms), Util: 95.01
Signed-off-by: Vineeth Pillai (Google) <vineeth@...byteword.org>
---
kernel/sched/deadline.c | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 91451c1c7e52..85902c4c484b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1272,7 +1272,7 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
* Umax: Max usable bandwidth for DL. Currently
* = sched_rt_runtime_us / sched_rt_period_us
* Uextra: Extra bandwidth not reserved:
- * = Umax - \Sum(u_i / #cpus in the root domain)
+ * = Umax - this_bw
* u_i: Bandwidth of an admitted dl task in the
* root domain.
*
@@ -1286,22 +1286,14 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
*/
static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
{
- u64 u_act;
- u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
-
/*
- * Instead of computing max{u, (rq->dl.max_bw - u_inact - u_extra)},
- * we compare u_inact + rq->dl.extra_bw with
- * rq->dl.max_bw - u, because u_inact + rq->dl.extra_bw can be larger
- * than rq->dl.max_bw (so, rq->dl.max_bw - u_inact - rq->dl.extra_bw
- * would be negative leading to wrong results)
+ * max{u, Umax - Uinact - Uextra}
+ * = max{u, max_bw - (this_bw - running_bw) + (this_bw - running_bw)}
+ * = max{u, running_bw} = running_bw
+ * So dq = -(max{u, Umax - Uinact - Uextra} / Umax) dt
+ * = -(running_bw / max_bw) dt
*/
- if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
- u_act = dl_se->dl_bw;
- else
- u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
-
- return div64_u64(delta * u_act, rq->dl.max_bw);
+ return div64_u64(delta * rq->dl.running_bw, rq->dl.max_bw);
}
/*
--
2.40.1
Powered by blists - more mailing lists