[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFpYl53ZMThWjQai@jlelli-thinkpadt14gen4.remote.csb>
Date: Tue, 24 Jun 2025 09:49:43 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: luca abeni <luca.abeni@...tannapisa.it>
Cc: Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>,
linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Vineeth Pillai <vineeth@...byteword.org>
Subject: Re: SCHED_DEADLINE tasks missing their deadline with
SCHED_FLAG_RECLAIM jobs in the mix (using GRUB)
On 20/06/25 18:52, luca abeni wrote:
> On Fri, 20 Jun 2025 17:28:28 +0200
> Juri Lelli <juri.lelli@...hat.com> wrote:
>
> > On 20/06/25 16:16, luca abeni wrote:
> [...]
> > > So, I had a look tying to to remember the situation... This is my
> > > current understanding:
> > > - the max_bw field should be just the maximum amount of CPU
> > > bandwidth we want to use with reclaiming... It is rt_runtime_us /
> > > rt_period_us; I guess it is cached in this field just to avoid
> > > computing it every time.
> > > So, max_bw should be updated only when
> > > /proc/sys/kernel/sched_rt_{runtime,period}_us are written
> > > - the extra_bw field represents an additional amount of CPU
> > > bandwidth we can reclaim on each core (the original m-GRUB
> > > algorithm just reclaimed Uinact, the utilization of inactive tasks).
> > > It is initialized to Umax when no SCHED_DEADLINE tasks exist and
> >
> > Is Umax == max_bw from above?
>
> Yes; sorry about the confusion
>
>
> > > should be decreased by Ui when a task with utilization Ui becomes
> > > SCHED_DEADLINE (and increased by Ui when the SCHED_DEADLINE task
> > > terminates or changes scheduling policy). Since this value is
> > > per_core, Ui is divided by the number of cores in the root
> > > domain... From what you write, I guess extra_bw is not correctly
> > > initialized/updated when a new root domain is created?
> >
> > It looks like so yeah. After boot and when domains are dinamically
> > created. But, I am still not 100%, I only see weird numbers that I
> > struggle to relate with what you say above. :)
>
> BTW, when running some tests on different machines I think I found out
> that 6.11 does not exhibit this issue (this needs to be confirmed, I am
> working on reproducing the test with different kernels on the same
> machine)
>
> If I manage to reproduce this result, I think I can run a bisect to the
> commit introducing the issue (git is telling me that I'll need about 15
> tests :)
> So, stay tuned...
The following seem to at least cure the problem after boot. Things are
still broken after cpusets creation. Moving to look into that, but
wanted to share where I am so that we don't duplicate work.
Rationale for the below is that we currently end up calling
__dl_update() with 'cpus' that are not stable yet. So, I tried to move
initialization after SMP is up (all CPUs have been onlined).
---
kernel/sched/core.c | 3 +++
kernel/sched/deadline.c | 39 +++++++++++++++++++++++----------------
kernel/sched/sched.h | 1 +
3 files changed, 27 insertions(+), 16 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8988d38d46a38..d152f8a84818b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8470,6 +8470,8 @@ void __init sched_init_smp(void)
init_sched_rt_class();
init_sched_dl_class();
+ sched_init_dl_servers();
+
sched_smp_initialized = true;
}
@@ -8484,6 +8486,7 @@ early_initcall(migration_init);
void __init sched_init_smp(void)
{
sched_init_granularity();
+ sched_init_dl_servers();
}
#endif /* CONFIG_SMP */
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ad45a8fea245e..9f3b3f3592a58 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1647,22 +1647,6 @@ void dl_server_start(struct sched_dl_entity *dl_se)
{
struct rq *rq = dl_se->rq;
- /*
- * XXX: the apply do not work fine at the init phase for the
- * fair server because things are not yet set. We need to improve
- * this before getting generic.
- */
- if (!dl_server(dl_se)) {
- u64 runtime = 50 * NSEC_PER_MSEC;
- u64 period = 1000 * NSEC_PER_MSEC;
-
- dl_server_apply_params(dl_se, runtime, period, 1);
-
- dl_se->dl_server = 1;
- dl_se->dl_defer = 1;
- setup_new_dl_entity(dl_se);
- }
-
if (!dl_se->dl_runtime)
return;
@@ -1693,6 +1677,29 @@ void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_se->server_pick_task = pick_task;
}
+void sched_init_dl_servers(void)
+{
+ int cpu;
+ struct rq *rq;
+ struct sched_dl_entity *dl_se;
+
+ for_each_online_cpu(cpu) {
+ u64 runtime = 50 * NSEC_PER_MSEC;
+ u64 period = 1000 * NSEC_PER_MSEC;
+
+ rq = cpu_rq(cpu);
+ dl_se = &rq->fair_server;
+
+ WARN_ON(dl_server(dl_se));
+
+ dl_server_apply_params(dl_se, runtime, period, 1);
+
+ dl_se->dl_server = 1;
+ dl_se->dl_defer = 1;
+ setup_new_dl_entity(dl_se);
+ }
+}
+
void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct rq *rq)
{
u64 new_bw = dl_se->dl_bw;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 475bb5998295e..22301c28a5d2d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -384,6 +384,7 @@ extern void dl_server_stop(struct sched_dl_entity *dl_se);
extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_server_has_tasks_f has_tasks,
dl_server_pick_f pick_task);
+extern void sched_init_dl_servers(void);
extern void dl_server_update_idle_time(struct rq *rq,
struct task_struct *p);
--
2.49.0
Powered by blists - more mailing lists