linux-kernel - Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier for hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6M5fQB9P1_bDF7A@jlelli-thinkpadt14gen4.remote.csb>
Date: Wed, 5 Feb 2025 11:12:13 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Jon Hunter <jonathanh@...dia.com>
Cc: Thierry Reding <treding@...dia.com>, Waiman Long <longman@...hat.com>,
	Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
	Michal Koutny <mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Phil Auld <pauld@...hat.com>, Qais Yousef <qyousef@...alina.io>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	"Joel Fernandes (Google)" <joel@...lfernandes.org>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Aashish Sharma <shraash@...gle.com>,
	Shin Kawamura <kawasin@...gle.com>,
	Vineeth Remanan Pillai <vineeth@...byteword.org>,
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
	"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier
 for hotplug

On 05/02/25 07:53, Juri Lelli wrote:
> On 03/02/25 11:01, Jon Hunter wrote:
> > Hi Juri,
> > 
> > On 16/01/2025 15:55, Juri Lelli wrote:
> > > On 16/01/25 13:14, Jon Hunter wrote:
> 
> ...
> 
> > > > [  210.595431] dl_bw_manage: cpu=5 cap=3072 fair_server_bw=52428 total_bw=209712 dl_bw_cpus=4
> > > > [  210.606269] dl_bw_manage: cpu=4 cap=2048 fair_server_bw=52428 total_bw=157284 dl_bw_cpus=3
> > > > [  210.617281] dl_bw_manage: cpu=3 cap=1024 fair_server_bw=52428 total_bw=104856 dl_bw_cpus=2
> > > > [  210.627205] dl_bw_manage: cpu=2 cap=1024 fair_server_bw=52428 total_bw=262140 dl_bw_cpus=2
> > > > [  210.637752] dl_bw_manage: cpu=1 cap=0 fair_server_bw=52428 total_bw=262140 dl_bw_cpus=1
> > >                                                                            ^
> > > Different than before but still not what I expected. Looks like there
> > > are conditions/path I currently cannot replicate on my setup, so more
> > > thinking. Unfortunately I will be out traveling next week, so this
> > > might required a bit of time.
> > 
> > 
> > I see that this is now in the mainline and our board is still failing to
> > suspend. Let me know if there is anything else you need me to test.
> 
> Ah, can you actually add 'sched_verbose' and to your kernel cmdline? It
> should print our additional debug info on the console when domains get
> reconfigured by hotplug/suspends, e.g.
> 
>  dl_bw_manage: cpu=3 cap=3072 fair_server_bw=52428 total_bw=209712 dl_bw_cpus=4
>  CPU0 attaching NULL sched-domain.
>  CPU3 attaching NULL sched-domain.
>  CPU4 attaching NULL sched-domain.
>  CPU5 attaching NULL sched-domain.
>  CPU0 attaching sched-domain(s):
>   domain-0: span=0,4-5 level=MC
>    groups: 0:{ span=0 cap=766 }, 4:{ span=4 cap=908 }, 5:{ span=5 cap=989 }
>  CPU4 attaching sched-domain(s):
>   domain-0: span=0,4-5 level=MC
>    groups: 4:{ span=4 cap=908 }, 5:{ span=5 cap=989 }, 0:{ span=0 cap=766 }
>  CPU5 attaching sched-domain(s):
>   domain-0: span=0,4-5 level=MC
>    groups: 5:{ span=5 cap=989 }, 0:{ span=0 cap=766 }, 4:{ span=4 cap=908 }
>  root domain span: 0,4-5
>  rd 0,4-5: Checking EAS, CPUs do not have asymmetric capacities
>  psci: CPU3 killed (polled 0 ms)
> 
> Can you please share this information as well if you are able to collect
> it (while still running with my last proposed fix)?

Also, if you don't mind, add the following on top of the existing
changes.

Just to be sure we don't get out of sync, I pushed current set to

https://github.com/jlelli/linux.git experimental/dl-debug

---
 kernel/sched/deadline.c | 2 +-
 kernel/sched/topology.c | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 9a47decd099a..504ff302299a 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3545,7 +3545,7 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
 		 * dl_servers we can discount, as tasks will be moved out the
 		 * offlined CPUs anyway.
 		 */
-		printk_deferred("%s: cpu=%d cap=%lu fair_server_bw=%llu total_bw=%llu dl_bw_cpus=%d\n", __func__, cpu, cap, fair_server_bw, dl_b->total_bw, dl_bw_cpus(cpu));
+		printk_deferred("%s: cpu=%d cap=%lu fair_server_bw=%llu total_bw=%llu dl_bw_cpus=%d type=%s span=%*pbl\n", __func__, cpu, cap, fair_server_bw, dl_b->total_bw, dl_bw_cpus(cpu), (cpu_rq(cpu)->rd == &def_root_domain) ? "DEF" : "DYN", cpumask_pr_args(cpu_rq(cpu)->rd->span));
 		if (dl_b->total_bw - fair_server_bw > 0) {
 			/*
 			 * Leaving at least one CPU for DEADLINE tasks seems a
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 93b08e76a52a..996270cd5bd2 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -137,6 +137,7 @@ static void sched_domain_debug(struct sched_domain *sd, int cpu)
 
 	if (!sd) {
 		printk(KERN_DEBUG "CPU%d attaching NULL sched-domain.\n", cpu);
+		printk(KERN_CONT "span=%*pbl\n", cpumask_pr_args(def_root_domain.span));
 		return;
 	}
 
@@ -2534,8 +2535,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 	if (has_cluster)
 		static_branch_inc_cpuslocked(&sched_cluster_active);
 
-	if (rq && sched_debug_verbose)
+	if (rq && sched_debug_verbose) {
 		pr_info("root domain span: %*pbl\n", cpumask_pr_args(cpu_map));
+		pr_info("default domain span: %*pbl\n", cpumask_pr_args(def_root_domain.span));
+	}
 
 	ret = 0;
 error: