[<prev] [next>] [day] [month] [year] [list]
Message-ID: <150729050549.744832.17481160177674200884.stgit@buzz>
Date: Fri, 06 Oct 2017 14:48:25 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org
Cc: Tejun Heo <tj@...nel.org>
Subject: [PATCH RFC] sched/cgroup: allow overcommit of rt group runtime
Currently group rt scheduler enforces strict non-overcommit policy:
sum(child_runtime / child_period) <= parent_runtime / parent_period
This is reasonable for true real-time applications but for messy/nested
containerized environments this makes configuration very complicated.
This patch adds scheduler feature RT_GROUP_OVERCOMMIT which replaces
strict policy with restrictions similar to cfs bandwidth: non-infinite
child runtime must not exceed parent runtime limit:
max(child_runtime / child_period) <= parent_runtime / parent_period
Also infinite runtime in child is allowed if parent runtime is non-zero.
I.e. zero rt runtime (default) forbids realtime tasks inside hierarchy.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
---
Documentation/scheduler/sched-rt-group.txt | 12 ++++++++++++
kernel/sched/features.h | 1 +
kernel/sched/rt.c | 26 ++++++++++++++++++++++++++
3 files changed, 39 insertions(+)
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index d8fce3e78457..123117e86051 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -145,6 +145,18 @@ For now, this can be simplified to just the following (but see Future plans):
\Sum_{i} runtime_{i} <= global_runtime
+2.4 Overcommit behaviour
+------------------------
+
+Feature RT_RUNTIME_OVERCOMMIT disables strict non-overcommit behaviour and
+requires only for each child runtime to be not bigger than parent runtime:
+
+ child_runtime / child_period <= parent_runtime / parent_period
+
+Also infinite runtime in child is allowed if parent runtime is non-zero.
+
+I.e. zero rt runtime (default) forbids realtime tasks inside hierarchy.
+
3. Future plans
===============
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index d3fb15555291..aa1ddb35adac 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -78,6 +78,7 @@ SCHED_FEAT(RT_PUSH_IPI, true)
#endif
SCHED_FEAT(RT_RUNTIME_SHARE, true)
+SCHED_FEAT(RT_RUNTIME_OVERCOMMIT, true)
SCHED_FEAT(LB_MIN, false)
SCHED_FEAT(ATTACH_AGE_LOAD, true)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e25b460d051f..e2c269394456 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2486,6 +2486,7 @@ static int tg_rt_schedulable(struct task_group *tg, void *data)
struct task_group *child;
unsigned long total, sum = 0;
u64 period, runtime;
+ u64 p_period, p_runtime;
period = ktime_to_ns(tg->rt_bandwidth.rt_period);
runtime = tg->rt_bandwidth.rt_runtime;
@@ -2509,6 +2510,31 @@ static int tg_rt_schedulable(struct task_group *tg, void *data)
total = to_ratio(period, runtime);
+ if (tg->parent == d->tg) {
+ p_period = d->rt_period;
+ p_runtime = d->rt_runtime;
+ } else if (tg->parent) {
+ p_period = ktime_to_ns(tg->parent->rt_bandwidth.rt_period);
+ p_runtime = tg->parent->rt_bandwidth.rt_runtime;
+ } else {
+ p_period = global_rt_period();
+ p_runtime = global_rt_runtime();
+ }
+
+ /*
+ * Child runtime should not exceed parent runtime,
+ * but infinite runtime allowed if parent runtime is non-zero.
+ */
+ if (sched_feat(RT_RUNTIME_OVERCOMMIT)) {
+ if (runtime == RUNTIME_INF) {
+ if (!p_runtime)
+ return -EINVAL;
+ } else if (total > to_ratio(p_period, p_runtime))
+ return -EINVAL;
+
+ return 0;
+ }
+
/*
* Nobody can have more than the global setting allows.
*/
Powered by blists - more mailing lists