[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241104174109.GA1044726@google.com>
Date: Mon, 4 Nov 2024 17:41:09 +0000
From: Joel Fernandes <joel@...lfernandes.org>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Suleiman Souhlal <suleiman@...gle.com>,
Aashish Sharma <shraash@...gle.com>,
Shin Kawamura <kawasin@...gle.com>,
Vineeth Remanan Pillai <vineeth@...byteword.org>
Subject: Re: [PATCH] dl_server: Reset DL server params when rd changes
On Mon, Nov 04, 2024 at 11:54:36AM +0100, Juri Lelli wrote:
> On 30/10/24 19:50, Joel Fernandes wrote:
>
> ...
>
> > With some prints [1] in the kernel, we can see on boot:
> >
> > $ dmesg|grep appl
> > [ 0.930337] dl_server_apply_params: cpu=0, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1
> > [ 0.949025] dl_server_apply_params: cpu=1, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1
> > [ 0.953026] dl_server_apply_params: cpu=2, runtime=50000000, period=1000000000, cpus=2, cap=2048, init=1
> > [ 0.957024] dl_server_apply_params: cpu=3, runtime=50000000, period=1000000000, cpus=3, cap=3072, init=1
> > [ 0.961023] dl_server_apply_params: cpu=4, runtime=50000000, period=1000000000, cpus=4, cap=4096, init=1
> > [ 0.965030] dl_server_apply_params: cpu=5, runtime=50000000, period=1000000000, cpus=5, cap=5120, init=1
> > [ 0.969024] dl_server_apply_params: cpu=6, runtime=50000000, period=1000000000, cpus=6, cap=6144, init=1
> > [ 0.973024] dl_server_apply_params: cpu=7, runtime=50000000, period=1000000000, cpus=7, cap=7168, init=1
> >
> > For the 8th apply_params, the 8th CPU is not considered. This is because
> > set_cpu_active() for the 8th CPU has not yet happened as mentioned in commit
> > message.
> >
> > With the patch:
> >
> > $ dmesg|grep appl
> > [ 0.961169] dl_server_apply_params: cpu=0, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1
> > [ 0.981936] dl_server_apply_params: cpu=1, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1
> > [ 0.985836] dl_server_apply_params: cpu=2, runtime=50000000, period=1000000000, cpus=2, cap=2048, init=1
> > [ 0.989835] dl_server_apply_params: cpu=3, runtime=50000000, period=1000000000, cpus=3, cap=3072, init=1
> > [ 0.993840] dl_server_apply_params: cpu=4, runtime=50000000, period=1000000000, cpus=4, cap=4096, init=1
> > [ 0.997835] dl_server_apply_params: cpu=5, runtime=50000000, period=1000000000, cpus=5, cap=5120, init=1
> > [ 1.001838] dl_server_apply_params: cpu=6, runtime=50000000, period=1000000000, cpus=6, cap=6144, init=1
> > [ 1.005834] dl_server_apply_params: cpu=7, runtime=50000000, period=1000000000, cpus=7, cap=7168, init=1
> >
> > [ ... here somewhere rd changes as topology init finishes, then all the
> > params are replied, this time with the correct rd. ]
> >
> > [ 1.009903] dl_server_apply_params: cpu=0, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
> > [ 1.012409] dl_server_apply_params: cpu=1, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
> > [ 1.014269] dl_server_apply_params: cpu=2, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
> > [ 1.019865] dl_server_apply_params: cpu=3, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
> > [ 1.054908] dl_server_apply_params: cpu=4, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
> > [ 1.081865] dl_server_apply_params: cpu=5, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
> > [ 1.108861] dl_server_apply_params: cpu=6, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
> > [ 1.136944] dl_server_apply_params: cpu=7, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
> >
> > The -EBUSY happens for our 5.15 backport. I see dl_b->total_bw to be 0
> > without my patch. Even if the -EBUSY doesn't happen for you (perhaps due to
> > compiler or other differences), shouldn't we use the correct rd for
> > apply_params? The dl_bw is tied to the rd via cpu_rq(cpu)->rd->dl_bw;
>
> I think I am still seeing something different.
>
> [ 0.184629] dl_server_apply_params: cpu=0, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1, dl_b=ffffffffb837c080, dl_b->total_bw=0, dl_b->bw=996147
> [ 0.185498] __dl_overflow: dl_b=ffffffffb837c080, dl_b->bw=996147, cap=1024, cap_scale=996147 dl_bw->total_bw=0, old_bw=0, new_bw=52428
> [ 0.371531] dl_server_apply_params: cpu=1, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1, dl_b=ffffffffb837c080, dl_b->total_bw=52428, dl_b->bw=996147
> [ 0.372498] __dl_overflow: dl_b=ffffffffb837c080, dl_b->bw=996147, cap=1024, cap_scale=996147 dl_bw->total_bw=52428, old_bw=0, new_bw=52428
> [ 0.373541] dl_server_apply_params: cpu=2, runtime=50000000, period=1000000000, cpus=2, cap=2048, init=1, dl_b=ffffffffb837c080, dl_b->total_bw=104856, dl_b->bw=996147
> [ 0.374498] __dl_overflow: dl_b=ffffffffb837c080, dl_b->bw=996147, cap=2048, cap_scale=1992294 dl_bw->total_bw=104856, old_bw=0, new_bw=52428
> [ 0.375507] dl_server_apply_params: cpu=3, runtime=50000000, period=1000000000, cpus=3, cap=3072, init=1, dl_b=ffffffffb837c080, dl_b->total_bw=157284, dl_b->bw=996147
> [ 0.376498] __dl_overflow: dl_b=ffffffffb837c080, dl_b->bw=996147, cap=3072, cap_scale=2988441 dl_bw->total_bw=157284, old_bw=0, new_bw=52428
> [ 0.377507] dl_server_apply_params: cpu=4, runtime=50000000, period=1000000000, cpus=4, cap=4096, init=1, dl_b=ffffffffb837c080, dl_b->total_bw=209712, dl_b->bw=996147
> [ 0.378498] __dl_overflow: dl_b=ffffffffb837c080, dl_b->bw=996147, cap=4096, cap_scale=3984588 dl_bw->total_bw=209712, old_bw=0, new_bw=52428
> [ 0.379505] dl_server_apply_params: cpu=5, runtime=50000000, period=1000000000, cpus=5, cap=5120, init=1, dl_b=ffffffffb837c080, dl_b->total_bw=262140, dl_b->bw=996147
> [ 0.380498] __dl_overflow: dl_b=ffffffffb837c080, dl_b->bw=996147, cap=5120, cap_scale=4980735 dl_bw->total_bw=262140, old_bw=0, new_bw=52428
> [ 0.381504] dl_server_apply_params: cpu=6, runtime=50000000, period=1000000000, cpus=6, cap=6144, init=1, dl_b=ffffffffb837c080, dl_b->total_bw=314568, dl_b->bw=996147
> [ 0.382498] __dl_overflow: dl_b=ffffffffb837c080, dl_b->bw=996147, cap=6144, cap_scale=5976882 dl_bw->total_bw=314568, old_bw=0, new_bw=52428
> [ 0.384527] dl_server_apply_params: cpu=7, runtime=50000000, period=1000000000, cpus=7, cap=7168, init=1, dl_b=ffffffffb837c080, dl_b->total_bw=366996, dl_b->bw=996147
> [ 0.385498] __dl_overflow: dl_b=ffffffffb837c080, dl_b->bw=996147, cap=7168, cap_scale=6973029 dl_bw->total_bw=366996, old_bw=0, new_bw=52428
> ...
> [ 0.388556] __dl_server_attach_root: cpu=0, new_bw=52428, dl_b=ffff949bc080e040, dl_b->total_bw=52428, dl_b->bw=996147
> [ 0.389507] __dl_server_attach_root: cpu=1, new_bw=52428, dl_b=ffff949bc080e040, dl_b->total_bw=104856, dl_b->bw=996147
> [ 0.390501] __dl_server_attach_root: cpu=2, new_bw=52428, dl_b=ffff949bc080e040, dl_b->total_bw=157284, dl_b->bw=996147
> [ 0.391503] __dl_server_attach_root: cpu=3, new_bw=52428, dl_b=ffff949bc080e040, dl_b->total_bw=209712, dl_b->bw=996147
> [ 0.392499] __dl_server_attach_root: cpu=4, new_bw=52428, dl_b=ffff949bc080e040, dl_b->total_bw=262140, dl_b->bw=996147
> [ 0.393499] __dl_server_attach_root: cpu=5, new_bw=52428, dl_b=ffff949bc080e040, dl_b->total_bw=314568, dl_b->bw=996147
> [ 0.394428] __dl_server_attach_root: cpu=6, new_bw=52428, dl_b=ffff949bc080e040, dl_b->total_bw=366996, dl_b->bw=996147
> [ 0.394499] __dl_server_attach_root: cpu=7, new_bw=52428, dl_b=ffff949bc080e040, dl_b->total_bw=419424, dl_b->bw=996147
>
>
> I added a printk in __dl_server_attach_root which is called after the
> dynamic rd is built to transfer bandwidth to it.
>
> __dl_server_attach_root came with d741f297bceaf ("sched/fair: Fair
> server interface"), do you have this change in your backport?
You nailed it! Our 5.15 backport appears to be slightly older and is missing
this from topology.c as you mentioned. Thanks for clarifying!
/*
* Because the rq is not a task, dl_add_task_root_domain() did not
* move the fair server bw to the rd if it already started.
* Add it now.
*/
if (rq->fair_server.dl_server)
__dl_server_attach_root(&rq->fair_server, rq);
>
> > So if rd changes during boot initialization, the correct dl_bw has to be
> > updated AFAICS. Also if cpusets are used, the rd for a CPU may change.
>
> cpusets changes are something that I still need to double check. Will
> do.
>
Sounds good, that would be good to verify.
Thanks,
- Joel
Powered by blists - more mailing lists