[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5d7e5c02-00ee-4891-a8cf-09abe3e089e1@nvidia.com>
Date: Fri, 10 Jan 2025 18:40:42 +0000
From: Jon Hunter <jonathanh@...dia.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutny <mkoutny@...e.com>,
Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Phil Auld <pauld@...hat.com>, Qais Yousef <qyousef@...alina.io>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
"Joel Fernandes (Google)" <joel@...lfernandes.org>,
Suleiman Souhlal <suleiman@...gle.com>, Aashish Sharma <shraash@...gle.com>,
Shin Kawamura <kawasin@...gle.com>,
Vineeth Remanan Pillai <vineeth@...byteword.org>,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier
for hotplug
Hi Juri,
On 10/01/2025 15:45, Juri Lelli wrote:
> Hi Jon,
>
> On 10/01/25 11:52, Jon Hunter wrote:
>> Hi Juri,
>>
>
> ...
>
>> I have noticed a suspend regression on one of our Tegra boards and bisect is
>> pointing to this commit. If I revert this on top of -next then I don't see
>> the issue.
>>
>> The only messages I see when suspend fails are ...
>>
>> [ 53.905976] Error taking CPU1 down: -16
>> [ 53.909887] Non-boot CPUs are not disabled
>>
>> So far this is only happening on Tegra186 (ARM64). Let me know if you have
>> any thoughts.
>
> Are you running any DEADLINE task in your configuration?
Not that I am aware of.
> In any case, could you please repro with the following (as a start)?
> It should print additional debugging info on the console.
>
> Thanks!
> Juri
>
> ---
> kernel/sched/deadline.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 62192ac79c30..77736bab1992 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -3530,6 +3530,7 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
> * dl_servers we can discount, as tasks will be moved out the
> * offlined CPUs anyway.
> */
> + printk_deferred("%s: cpu=%d cap=%lu fair_server_bw=%llu total_bw=%llu dl_bw_cpus=%d\n", __func__, cpu, cap, fair_server_bw, dl_b->total_bw, dl_bw_cpus(cpu));
> if (dl_b->total_bw - fair_server_bw > 0) {
> /*
> * Leaving at least one CPU for DEADLINE tasks seems a
>
With the above I see the following ...
[ 53.919672] dl_bw_manage: cpu=5 cap=3072 fair_server_bw=52428 total_bw=209712 dl_bw_cpus=4
[ 53.930608] dl_bw_manage: cpu=4 cap=2048 fair_server_bw=52428 total_bw=157284 dl_bw_cpus=3
[ 53.941601] dl_bw_manage: cpu=3 cap=1024 fair_server_bw=52428 total_bw=104856 dl_bw_cpus=2
[ 53.952186] dl_bw_manage: cpu=2 cap=1024 fair_server_bw=52428 total_bw=576708 dl_bw_cpus=2
[ 53.962938] dl_bw_manage: cpu=1 cap=0 fair_server_bw=52428 total_bw=576708 dl_bw_cpus=1
[ 53.971068] Error taking CPU1 down: -16
[ 53.974912] Non-boot CPUs are not disabled
Thanks
Jon
--
nvpublic
Powered by blists - more mailing lists