[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aOeFmZIBIj6HDiIZ@jlelli-thinkpadt14gen4.remote.csb>
Date: Thu, 9 Oct 2025 11:51:21 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Yuri Andriaccio <yurand2000@...il.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org,
Luca Abeni <luca.abeni@...tannapisa.it>,
Yuri Andriaccio <yuri.andriaccio@...tannapisa.it>
Subject: Re: [RFC PATCH v3 13/24] sched/rt: Update rt-cgroup schedulability
checks
Hello,
On 29/09/25 11:22, Yuri Andriaccio wrote:
> From: luca abeni <luca.abeni@...tannapisa.it>
>
> Update schedulability checks and setup of runtime/period for rt-cgroups.
So, it looks like changelogs are all too minimal and dry. Having a more
comprehensive (but concise) description of what the patch does, why it
does it and how, will help reviewing and is essential in the future to
trace back design decisions and issues. Please consider this for current
and previous/subsequent patches.
> Co-developed-by: Alessio Balsini <a.balsini@...up.it>
> Signed-off-by: Alessio Balsini <a.balsini@...up.it>
> Co-developed-by: Andrea Parri <parri.andrea@...il.com>
> Signed-off-by: Andrea Parri <parri.andrea@...il.com>
> Co-developed-by: Yuri Andriaccio <yurand2000@...il.com>
> Signed-off-by: Yuri Andriaccio <yurand2000@...il.com>
> Signed-off-by: luca abeni <luca.abeni@...tannapisa.it>
> ---
> kernel/sched/core.c | 6 ++++
> kernel/sched/deadline.c | 46 +++++++++++++++++++++++----
> kernel/sched/rt.c | 70 +++++++++++++++++++++++------------------
> kernel/sched/sched.h | 1 +
> 4 files changed, 87 insertions(+), 36 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2cfbe3b7b17..1217f714dd2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9281,6 +9281,12 @@ cpu_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> return &root_task_group.css;
> }
>
> + /* Do not allow cpu_cgroup hierachies with depth greater than 2. */
I believe this limit gets removed with later patches, but since we have
it here, we should state why we have the limit in place.
> +#ifdef CONFIG_RT_GROUP_SCHED
> + if (parent != &root_task_group)
> + return ERR_PTR(-EINVAL);
> +#endif
> +
> tg = sched_create_group(parent);
> if (IS_ERR(tg))
> return ERR_PTR(-ENOMEM);
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 1293b9a252b..5d93b3ca030 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -347,7 +347,47 @@ void cancel_inactive_timer(struct sched_dl_entity *dl_se)
> cancel_dl_timer(dl_se, &dl_se->inactive_timer);
> }
>
> +/*
> + * Used for dl_bw check and update, used under sched_rt_handler()::mutex and
> + * sched_domains_mutex.
> + */
> +u64 dl_cookie;
> +
> #ifdef CONFIG_RT_GROUP_SCHED
> +int dl_check_tg(unsigned long total)
> +{
> + unsigned long flags;
> + int which_cpu;
> + int cpus;
> + struct dl_bw *dl_b;
> + u64 gen = ++dl_cookie;
> +
> + for_each_possible_cpu(which_cpu) {
> + rcu_read_lock_sched();
> +
> + if (!dl_bw_visited(which_cpu, gen)) {
> + cpus = dl_bw_cpus(which_cpu);
> + dl_b = dl_bw_of(which_cpu);
> +
> + raw_spin_lock_irqsave(&dl_b->lock, flags);
> +
> + if (dl_b->bw != -1 &&
> + dl_b->bw * cpus < dl_b->total_bw + total * cpus) {
Does this need to use cap_scale()?
> + raw_spin_unlock_irqrestore(&dl_b->lock, flags);
> + rcu_read_unlock_sched();
> +
> + return 0;
> + }
> +
> + raw_spin_unlock_irqrestore(&dl_b->lock, flags);
> + }
> +
> + rcu_read_unlock_sched();
> + }
> +
> + return 1;
> +}
> +
...
> @@ -2034,8 +2029,8 @@ static int tg_rt_schedulable(struct task_group *tg, void *data)
> unsigned long total, sum = 0;
> u64 period, runtime;
>
> - period = ktime_to_ns(tg->rt_bandwidth.rt_period);
> - runtime = tg->rt_bandwidth.rt_runtime;
> + period = tg->dl_bandwidth.dl_period;
> + runtime = tg->dl_bandwidth.dl_runtime;
Just as an example, this is the kind of important change (rt_bandwidth
-> dl_bandwidth) that usually deserves to be explicitly mentioned in the
changelog.
...
> @@ -2097,6 +2096,20 @@ static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime)
> .rt_runtime = runtime,
> };
>
> + /*
> + * Since we truncate DL_SCALE bits, make sure we're at least
> + * that big.
> + */
> + if (runtime != 0 && runtime < (1ULL << DL_SCALE))
> + return -EINVAL;
> +
> + /*
^
Nit, fix alignment.
> + * Since we use the MSB for wrap-around and sign issues, make
> + * sure it's not set (mind that period can be equal to zero).
> + */
> + if (period & (1ULL << 63))
> + return -EINVAL;
> +
Thanks,
Juri
Powered by blists - more mailing lists