[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aO-uqIZRS3qqsuN6@jlelli-thinkpadt14gen4.remote.csb>
Date: Wed, 15 Oct 2025 16:24:40 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Yuri Andriaccio <yurand2000@...il.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org,
Luca Abeni <luca.abeni@...tannapisa.it>,
Yuri Andriaccio <yuri.andriaccio@...tannapisa.it>
Subject: Re: [RFC PATCH v3 18/24] sched/deadline: Allow deeper hierarchies of
RT cgroups
Hello,
On 29/09/25 11:22, Yuri Andriaccio wrote:
> From: luca abeni <luca.abeni@...tannapisa.it>
>
> Allow creation of cgroup hierachies with depth greater than two.
> Add check to prevent attaching tasks to a child cgroup of an active cgroup (i.e.
> with a running FIFO/RR task).
> Add check to prevent attaching tasks to cgroups which have children with
> non-zero runtime.
> Update rt-cgroups allocated bandwidth accounting for nested cgroup hierachies.
>
> Co-developed-by: Yuri Andriaccio <yurand2000@...il.com>
> Signed-off-by: Yuri Andriaccio <yurand2000@...il.com>
> Signed-off-by: luca abeni <luca.abeni@...tannapisa.it>
> ---
> kernel/sched/core.c | 6 -----
> kernel/sched/deadline.c | 51 +++++++++++++++++++++++++++++++++++++----
> kernel/sched/rt.c | 16 ++++++++++---
> kernel/sched/sched.h | 3 ++-
> 4 files changed, 62 insertions(+), 14 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 6f516cdc7bb..d1d7215c4a2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9281,12 +9281,6 @@ cpu_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> return &root_task_group.css;
> }
>
> - /* Do not allow cpu_cgroup hierachies with depth greater than 2. */
> -#ifdef CONFIG_RT_GROUP_SCHED
> - if (parent != &root_task_group)
> - return ERR_PTR(-EINVAL);
> -#endif
> -
> tg = sched_create_group(parent);
> if (IS_ERR(tg))
> return ERR_PTR(-ENOMEM);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 5d93b3ca030..abe11985c41 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -388,11 +388,42 @@ int dl_check_tg(unsigned long total)
> return 1;
> }
>
> -void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 rt_period)
> +bool is_active_sched_group(struct task_group *tg)
I wonder if the function name could be misleading, as this checks runtime
and not if there are tasks in the group.
> {
> + struct task_group *child;
> + bool is_active = 1;
> +
> + // if there are no children, this is a leaf group, thus it is active
> + list_for_each_entry_rcu(child, &tg->children, siblings) {
> + if (child->dl_bandwidth.dl_runtime > 0) {
> + is_active = 0;
> + }
> + }
> + return is_active;
> +}
> +
> +static inline bool sched_group_has_active_siblings(struct task_group *tg)
> +{
> + struct task_group *child;
> + bool has_active_siblings = 0;
> +
> + // if there are no children, this is a leaf group, thus it is active
Copy-pasta from above? :) Also not the correct comment style.
> + list_for_each_entry_rcu(child, &tg->parent->children, siblings) {
> + if (child != tg && child->dl_bandwidth.dl_runtime > 0) {
> + has_active_siblings = 1;
> + }
> + }
> + return has_active_siblings;
> +}
> +
> +void dl_init_tg(struct task_group *tg, int cpu, u64 rt_runtime, u64 rt_period)
> +{
> + struct sched_dl_entity *dl_se = tg->dl_se[cpu];
> struct rq *rq = container_of(dl_se->dl_rq, struct rq, dl);
> - int is_active;
> - u64 new_bw;
> + int is_active, is_active_group;
> + u64 old_runtime, new_bw;
> +
> + is_active_group = is_active_sched_group(tg);
>
> raw_spin_rq_lock_irq(rq);
> is_active = dl_se->my_q->rt.rt_nr_running > 0;
> @@ -400,8 +431,10 @@ void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 rt_period)
> update_rq_clock(rq);
> dl_server_stop(dl_se);
>
> + old_runtime = dl_se->dl_runtime;
> new_bw = to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> - dl_rq_change_utilization(rq, dl_se, new_bw);
> + if (is_active_group)
> + dl_rq_change_utilization(rq, dl_se, new_bw);
>
> dl_se->dl_runtime = rt_runtime;
> dl_se->dl_deadline = rt_period;
> @@ -413,6 +446,16 @@ void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 rt_period)
> dl_se->dl_bw = new_bw;
> dl_se->dl_density = new_bw;
>
> + // add/remove the parent's bw
Comment style is not correct. Also the comment itself is not very much
informative. What about something like (IIUC)
/*
* Handle parent bandwidth accounting when child runtime changes:
* - Disabling the last active child: parent becomes a leaf group,
* so add the parent's bandwidth back to active accounting
* - Enabling the first child: parent becomes a non-leaf group,
* so remove the parent's bandwidth from active accounting
* Only leaf groups (those without active children) should have
* non-zero bandwidth.
*/
> + if (tg->parent && tg->parent != &root_task_group)
> + {
> + if (rt_runtime == 0 && old_runtime != 0 && !sched_group_has_active_siblings(tg)) {
> + __add_rq_bw(tg->parent->dl_se[cpu]->dl_bw, dl_se->dl_rq);
> + } else if (rt_runtime != 0 && old_runtime == 0 && !sched_group_has_active_siblings(tg)) {
> + __sub_rq_bw(tg->parent->dl_se[cpu]->dl_bw, dl_se->dl_rq);
> + }
> + }
> +
Thanks,
Juri
Powered by blists - more mailing lists