linux-kernel - Re: [PATCH] dl_server: Reset DL server params when rd changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zy3qszfGHO5PUa6W@jlelli-thinkpadt14gen4.remote.csb>
Date: Fri, 8 Nov 2024 11:40:51 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Waiman Long <llong@...hat.com>
Cc: Joel Fernandes <joel@...lfernandes.org>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Aashish Sharma <shraash@...gle.com>,
	Shin Kawamura <kawasin@...gle.com>,
	Vineeth Remanan Pillai <vineeth@...byteword.org>
Subject: Re: [PATCH] dl_server: Reset DL server params when rd changes

On 07/11/24 23:40, Waiman Long wrote:
> On 11/6/24 1:05 PM, Waiman Long wrote:
> > On 11/6/24 11:08 AM, Juri Lelli wrote:
> > > On 04/11/24 17:41, Joel Fernandes wrote:
> > > > On Mon, Nov 04, 2024 at 11:54:36AM +0100, Juri Lelli wrote:
> > > ...
> > > 
> > > > > I added a printk in __dl_server_attach_root which is called after the
> > > > > dynamic rd is built to transfer bandwidth to it.
> > > > > 
> > > > > __dl_server_attach_root came with d741f297bceaf ("sched/fair: Fair
> > > > > server interface"), do you have this change in your backport?
> > > > You nailed it! Our 5.15 backport appears to be slightly older
> > > > and is missing
> > > > this from topology.c as you mentioned. Thanks for clarifying!
> > > > 
> > > > 
> > > >          /*
> > > >           * Because the rq is not a task,
> > > > dl_add_task_root_domain() did not
> > > >           * move the fair server bw to the rd if it already started.
> > > >           * Add it now.
> > > >           */
> > > >          if (rq->fair_server.dl_server)
> > > > __dl_server_attach_root(&rq->fair_server, rq);
> > > > 
> > > > > > So if rd changes during boot initialization, the correct
> > > > > > dl_bw has to be
> > > > > > updated AFAICS. Also if cpusets are used, the rd for a
> > > > > > CPU may change.
> > > > > cpusets changes are something that I still need to double check. Will
> > > > > do.
> > > > > 
> > > > Sounds good, that would be good to verify.
> > > So, I played a little bit with it and came up with a simple set of ops
> > > that point out an issue (default fedora server install):
> > > 
> > > echo Y >/sys/kernel/debug/sched/verbose
> > > 
> > > echo +cpuset >/sys/fs/cgroup/cgroup.subtree_control
> > > 
> > > echo 0-7 > /sys/fs/cgroup/user.slice/cpuset.cpus
> > > echo 6-7 > /sys/fs/cgroup/user.slice/cpuset.cpus.exclusive
> > > echo root >/sys/fs/cgroup/user.slice/cpuset.cpus.partition
> > > 
> > > The domains are rebuilt correctly, but we end up with a null total_bw.
> > > 
> > > The conditional call above takes care correctly of adding back dl_server
> > > per-rq bandwidth when we pass from the single domain to the 2 exclusive
> > > ones, but I noticed that we go through partition_sched_domains_locked()
> > > twice for a single write of 'root' and the second one, since it's not
> > > actually destroying/rebuilding anything, is resetting total_bw w/o
> > > addition dl_server contribution back.
> > > 
> > > Now, not completely sure why we need to go through partition_sched_
> > > domains_locked() twice, as we have (it also looked like a pattern from
> > > other call paths)
> > > 
> > > update_prstate()
> > > -> update_cpumasks_hier()
> > >     -> rebuild_sched_domains_locked() <- right at the end
> > > -> update_partition_sd_lb()
> > >     -> rebuild_sched_domains_locked() <- right after the above call
> > > 
> > > Removing the first call does indeed fix the issue and domains look OK,
> > > but I'm pretty sure I'm missing all sort of details and corner cases.
> > > 
> > > Waiman (now Cc-ed), maybe you can help here understanding why the two
> > > back to back calls are needed?
> > 
> > Thanks for letting me know about this case.
> > 
> > I am aware that rebuild_sched_domains_locked() can be called more than
> > once. I have addressed the hotplug case, but it can happen in some other
> > corner cases as well. The problem with multiple
> > rebuild_sched_domains_locked() calls is the fact that intermediate ones
> > may be called where the internal states may not be consistent. I am
> > going to work on a fix to this issue by making sure that
> > rebuild_sched_domains_locked() will only be called once.
> 
> I am working on a set of cpuset patches to eliminate redundant
> rebuild_sched_domains_locked() calls. However, my cpuset test script fails
> after the change due to the presence of test cases where the only CPU in a
> 1-cpu partition is being offlined. So I sent out a sched/deadline patch [1]
> to work around this particular corner case.
> 
> [1]
> https://lore.kernel.org/lkml/20241108042924.520458-1-longman@redhat.com/T/#u
> 
> Apparently, the null total_bw bug caused by multiple
> rebuild_sched_domains_locked() calls masks this problem.
> 
> Anyway, I should be able to post the cpuset patch series next week after
> further testing. Please review my sched/deadline patch to see if you are OK
> with this minor change.

Thank you! Will take a look.

Best,
Juri