[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z7RZ4141H-FnoQPW@jlelli-thinkpadt14gen4.remote.csb>
Date: Tue, 18 Feb 2025 10:58:59 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Jon Hunter <jonathanh@...dia.com>
Cc: Christian Loehle <christian.loehle@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Thierry Reding <treding@...dia.com>,
Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Koutny <mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Phil Auld <pauld@...hat.com>, Qais Yousef <qyousef@...alina.io>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
"Joel Fernandes (Google)" <joel@...lfernandes.org>,
Suleiman Souhlal <suleiman@...gle.com>,
Aashish Sharma <shraash@...gle.com>,
Shin Kawamura <kawasin@...gle.com>,
Vineeth Remanan Pillai <vineeth@...byteword.org>,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier
for hotplug
Hi!
On 17/02/25 17:08, Juri Lelli wrote:
> On 14/02/25 10:05, Jon Hunter wrote:
...
> At this point I believe you triggered suspend.
>
> > [ 57.290150] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
> > [ 57.335619] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC
> > [ 57.353364] dwc-eth-dwmac 2490000.ethernet eth0: Link is Down
> > [ 57.397022] Disabling non-boot CPUs ...
>
> Offlining CPU5.
>
> > [ 57.400904] dl_bw_manage: cpu=5 cap=3072 fair_server_bw=52428 total_bw=209712 dl_bw_cpus=4 type=DYN span=0,3-5
> > [ 57.400949] CPU0 attaching NULL sched-domain.
> > [ 57.415298] span=1-2
> > [ 57.417483] __dl_sub: cpus=3 tsk_bw=52428 total_bw=157284 span=0,3-5 type=DYN
> > [ 57.417487] __dl_server_detach_root: cpu=0 rd_span=0,3-5 total_bw=157284
> > [ 57.417496] rq_attach_root: cpu=0 old_span=NULL new_span=1-2
> > [ 57.417501] __dl_add: cpus=3 tsk_bw=52428 total_bw=157284 span=0-2 type=DEF
> > [ 57.417504] __dl_server_attach_root: cpu=0 rd_span=0-2 total_bw=157284
> > [ 57.417507] CPU3 attaching NULL sched-domain.
> > [ 57.454804] span=0-2
> > [ 57.456987] __dl_sub: cpus=2 tsk_bw=52428 total_bw=104856 span=3-5 type=DYN
> > [ 57.456990] __dl_server_detach_root: cpu=3 rd_span=3-5 total_bw=104856
> > [ 57.456998] rq_attach_root: cpu=3 old_span=NULL new_span=0-2
> > [ 57.457000] __dl_add: cpus=4 tsk_bw=52428 total_bw=209712 span=0-3 type=DEF
> > [ 57.457003] __dl_server_attach_root: cpu=3 rd_span=0-3 total_bw=209712
> > [ 57.457006] CPU4 attaching NULL sched-domain.
> > [ 57.493964] span=0-3
> > [ 57.496152] __dl_sub: cpus=1 tsk_bw=52428 total_bw=52428 span=4-5 type=DYN
> > [ 57.496156] __dl_server_detach_root: cpu=4 rd_span=4-5 total_bw=52428
> > [ 57.496162] rq_attach_root: cpu=4 old_span=NULL new_span=0-3
> > [ 57.496165] __dl_add: cpus=5 tsk_bw=52428 total_bw=262140 span=0-4 type=DEF
> > [ 57.496168] __dl_server_attach_root: cpu=4 rd_span=0-4 total_bw=262140
> > [ 57.496171] CPU5 attaching NULL sched-domain.
> > [ 57.532952] span=0-4
> > [ 57.535143] rq_attach_root: cpu=5 old_span= new_span=0-4
> > [ 57.535147] __dl_add: cpus=5 tsk_bw=52428 total_bw=314568 span=0-5 type=DEF
>
> Maybe we shouldn't add the dl_server contribution of a CPU that is going
> to be offline.
I tried to implement this idea and ended up with the following. As usual
also pushed it to the branch on github. Could you please update and
re-test?
Another thing that I noticed is that in my case an hotplug operation
generating a sched/root domain rebuild ends up calling dl_rebuild_
rd_accounting() (from partition_and_rebuild_sched_domains()) which
resets accounting for def and dyn domains. In your case (looking again
at the last dmesg you shared) I don't see this call, so I wonder if for
some reason related to your setup we do the rebuild by calling partition_
sched_domains() (instead of partition_and_rebuild_) and this doesn't
call dl_rebuild_rd_accounting() after partition_sched_domains_locked() -
maybe it should? Dietmar, Christian, Peter, what do you think?
Thanks,
Juri
Powered by blists - more mailing lists