linux-kernel - Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier for hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4fd_6M2vhSMSR0i@jlelli-thinkpadt14gen4.remote.csb>
Date: Wed, 15 Jan 2025 17:10:39 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Jon Hunter <jonathanh@...dia.com>
Cc: Thierry Reding <treding@...dia.com>, Waiman Long <longman@...hat.com>,
	Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
	Michal Koutny <mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Phil Auld <pauld@...hat.com>, Qais Yousef <qyousef@...alina.io>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	"Joel Fernandes (Google)" <joel@...lfernandes.org>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Aashish Sharma <shraash@...gle.com>,
	Shin Kawamura <kawasin@...gle.com>,
	Vineeth Remanan Pillai <vineeth@...byteword.org>,
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
	"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier
 for hotplug

On 14/01/25 15:02, Juri Lelli wrote:
> On 14/01/25 13:52, Jon Hunter wrote:
> > 
> > On 13/01/2025 09:32, Juri Lelli wrote:
> > > On 10/01/25 18:40, Jon Hunter wrote:
> > > 
> > > ...
> > > 
> > > > With the above I see the following ...
> > > > 
> > > > [   53.919672] dl_bw_manage: cpu=5 cap=3072 fair_server_bw=52428 total_bw=209712 dl_bw_cpus=4
> > > > [   53.930608] dl_bw_manage: cpu=4 cap=2048 fair_server_bw=52428 total_bw=157284 dl_bw_cpus=3
> > > > [   53.941601] dl_bw_manage: cpu=3 cap=1024 fair_server_bw=52428 total_bw=104856 dl_bw_cpus=2
> > > 
> > > So far so good.
> > > 
> > > > [   53.952186] dl_bw_manage: cpu=2 cap=1024 fair_server_bw=52428 total_bw=576708 dl_bw_cpus=2
> > > 
> > > But, this above doesn't sound right.
> > > 
> > > > [   53.962938] dl_bw_manage: cpu=1 cap=0 fair_server_bw=52428 total_bw=576708 dl_bw_cpus=1
> > > > [   53.971068] Error taking CPU1 down: -16
> > > > [   53.974912] Non-boot CPUs are not disabled
> > > 
> > > What is the topology of your board?
> > > 
> > > Are you using any cpuset configuration for partitioning CPUs?
> > 
> > 
> > I just noticed that by default we do boot this board with 'isolcpus=1-2'. I
> > see that this is a deprecated cmdline argument now and I must admit I don't
> > know the history of this for this specific board. It is quite old now.
> > 
> > Thierry, I am curious if you have this set for Tegra186 or not? Looks like
> > our BSP (r35 based) sets this by default.
> > 
> > I did try removing this and that does appear to fix it.
> 
> OK, good.
> 
> > Juri, let me know your thoughts.
> 
> Thanks for the additional info. I guess I could now try to repro using
> isolcpus at boot on systems I have access to (to possibly understand
> what the underlying problem is).

I think the problem lies in the def_root_domain accounting of dl_servers
(which isolated cpus remains attached to).

Came up with the following, of which I'm not yet fully convinced, but
could you please try it out on top of the debug patch and see how it
does with the original failing setup using isolcpus?

Thanks!

---
 kernel/sched/deadline.c | 15 +++++++++++++++
 kernel/sched/sched.h    |  1 +
 kernel/sched/topology.c |  3 +++
 3 files changed, 19 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 77736bab1992..9a47decd099a 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1709,6 +1709,21 @@ void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct rq *rq)
 	__dl_add(dl_b, new_bw, dl_bw_cpus(cpu));
 }
 
+void __dl_server_detach_root(struct sched_dl_entity *dl_se, struct rq *rq)
+{
+	u64 old_bw = dl_se->dl_bw;
+	int cpu = cpu_of(rq);
+	struct dl_bw *dl_b;
+
+	dl_b = dl_bw_of(cpu_of(rq));
+	guard(raw_spinlock)(&dl_b->lock);
+
+	if (!dl_bw_cpus(cpu))
+		return;
+
+	__dl_sub(dl_b, old_bw, dl_bw_cpus(cpu));
+}
+
 int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 period, bool init)
 {
 	u64 old_bw = init ? 0 : to_ratio(dl_se->dl_period, dl_se->dl_runtime);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 65fa64845d9f..ec0dfd82157e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -395,6 +395,7 @@ extern void dl_server_update_idle_time(struct rq *rq,
 		    struct task_struct *p);
 extern void fair_server_init(struct rq *rq);
 extern void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct rq *rq);
+extern void __dl_server_detach_root(struct sched_dl_entity *dl_se, struct rq *rq);
 extern int dl_server_apply_params(struct sched_dl_entity *dl_se,
 		    u64 runtime, u64 period, bool init);
 
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index da33ec9e94ab..93b08e76a52a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -495,6 +495,9 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
 	if (rq->rd) {
 		old_rd = rq->rd;
 
+		if (rq->fair_server.dl_server)
+			__dl_server_detach_root(&rq->fair_server, rq);
+
 		if (cpumask_test_cpu(rq->cpu, old_rd->online))
 			set_rq_offline(rq);
 
--