[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZyuUcJDPBln1BK1Y@jlelli-thinkpadt14gen4.remote.csb>
Date: Wed, 6 Nov 2024 17:08:16 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Suleiman Souhlal <suleiman@...gle.com>,
Aashish Sharma <shraash@...gle.com>,
Shin Kawamura <kawasin@...gle.com>,
Vineeth Remanan Pillai <vineeth@...byteword.org>,
Waiman Long <longman@...hat.com>
Subject: Re: [PATCH] dl_server: Reset DL server params when rd changes
On 04/11/24 17:41, Joel Fernandes wrote:
> On Mon, Nov 04, 2024 at 11:54:36AM +0100, Juri Lelli wrote:
...
> > I added a printk in __dl_server_attach_root which is called after the
> > dynamic rd is built to transfer bandwidth to it.
> >
> > __dl_server_attach_root came with d741f297bceaf ("sched/fair: Fair
> > server interface"), do you have this change in your backport?
>
> You nailed it! Our 5.15 backport appears to be slightly older and is missing
> this from topology.c as you mentioned. Thanks for clarifying!
>
>
> /*
> * Because the rq is not a task, dl_add_task_root_domain() did not
> * move the fair server bw to the rd if it already started.
> * Add it now.
> */
> if (rq->fair_server.dl_server)
> __dl_server_attach_root(&rq->fair_server, rq);
>
> >
> > > So if rd changes during boot initialization, the correct dl_bw has to be
> > > updated AFAICS. Also if cpusets are used, the rd for a CPU may change.
> >
> > cpusets changes are something that I still need to double check. Will
> > do.
> >
>
> Sounds good, that would be good to verify.
So, I played a little bit with it and came up with a simple set of ops
that point out an issue (default fedora server install):
echo Y >/sys/kernel/debug/sched/verbose
echo +cpuset >/sys/fs/cgroup/cgroup.subtree_control
echo 0-7 > /sys/fs/cgroup/user.slice/cpuset.cpus
echo 6-7 > /sys/fs/cgroup/user.slice/cpuset.cpus.exclusive
echo root >/sys/fs/cgroup/user.slice/cpuset.cpus.partition
The domains are rebuilt correctly, but we end up with a null total_bw.
The conditional call above takes care correctly of adding back dl_server
per-rq bandwidth when we pass from the single domain to the 2 exclusive
ones, but I noticed that we go through partition_sched_domains_locked()
twice for a single write of 'root' and the second one, since it's not
actually destroying/rebuilding anything, is resetting total_bw w/o
addition dl_server contribution back.
Now, not completely sure why we need to go through partition_sched_
domains_locked() twice, as we have (it also looked like a pattern from
other call paths)
update_prstate()
-> update_cpumasks_hier()
-> rebuild_sched_domains_locked() <- right at the end
-> update_partition_sd_lb()
-> rebuild_sched_domains_locked() <- right after the above call
Removing the first call does indeed fix the issue and domains look OK,
but I'm pretty sure I'm missing all sort of details and corner cases.
Waiman (now Cc-ed), maybe you can help here understanding why the two
back to back calls are needed?
Thanks!
Juri
Powered by blists - more mailing lists