lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5d7e5c02-00ee-4891-a8cf-09abe3e089e1@nvidia.com>
Date: Fri, 10 Jan 2025 18:40:42 +0000
From: Jon Hunter <jonathanh@...dia.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Michal Koutny <mkoutny@...e.com>,
 Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Phil Auld <pauld@...hat.com>, Qais Yousef <qyousef@...alina.io>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 "Joel Fernandes (Google)" <joel@...lfernandes.org>,
 Suleiman Souhlal <suleiman@...gle.com>, Aashish Sharma <shraash@...gle.com>,
 Shin Kawamura <kawasin@...gle.com>,
 Vineeth Remanan Pillai <vineeth@...byteword.org>,
 linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
 "linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier
 for hotplug

Hi Juri,

On 10/01/2025 15:45, Juri Lelli wrote:
> Hi Jon,
> 
> On 10/01/25 11:52, Jon Hunter wrote:
>> Hi Juri,
>>
> 
> ...
> 
>> I have noticed a suspend regression on one of our Tegra boards and bisect is
>> pointing to this commit. If I revert this on top of -next then I don't see
>> the issue.
>>
>> The only messages I see when suspend fails are ...
>>
>> [   53.905976] Error taking CPU1 down: -16
>> [   53.909887] Non-boot CPUs are not disabled
>>
>> So far this is only happening on Tegra186 (ARM64). Let me know if you have
>> any thoughts.
> 
> Are you running any DEADLINE task in your configuration?

Not that I am aware of.

> In any case, could you please repro with the following (as a start)?
> It should print additional debugging info on the console.
> 
> Thanks!
> Juri
> 
> ---
>   kernel/sched/deadline.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 62192ac79c30..77736bab1992 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -3530,6 +3530,7 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
>   		 * dl_servers we can discount, as tasks will be moved out the
>   		 * offlined CPUs anyway.
>   		 */
> +		printk_deferred("%s: cpu=%d cap=%lu fair_server_bw=%llu total_bw=%llu dl_bw_cpus=%d\n", __func__, cpu, cap, fair_server_bw, dl_b->total_bw, dl_bw_cpus(cpu));
>   		if (dl_b->total_bw - fair_server_bw > 0) {
>   			/*
>   			 * Leaving at least one CPU for DEADLINE tasks seems a
> 

With the above I see the following ...

[   53.919672] dl_bw_manage: cpu=5 cap=3072 fair_server_bw=52428 total_bw=209712 dl_bw_cpus=4
[   53.930608] dl_bw_manage: cpu=4 cap=2048 fair_server_bw=52428 total_bw=157284 dl_bw_cpus=3
[   53.941601] dl_bw_manage: cpu=3 cap=1024 fair_server_bw=52428 total_bw=104856 dl_bw_cpus=2
[   53.952186] dl_bw_manage: cpu=2 cap=1024 fair_server_bw=52428 total_bw=576708 dl_bw_cpus=2
[   53.962938] dl_bw_manage: cpu=1 cap=0 fair_server_bw=52428 total_bw=576708 dl_bw_cpus=1
[   53.971068] Error taking CPU1 down: -16
[   53.974912] Non-boot CPUs are not disabled

Thanks
Jon

-- 
nvpublic


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ