[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8e3dfbd-0efa-4b4e-bc18-d00abaa79f14@redhat.com>
Date: Sat, 9 Nov 2024 13:18:17 -0500
From: Waiman Long <llong@...hat.com>
To: Juri Lelli <juri.lelli@...hat.com>,
Joel Fernandes <joel@...lfernandes.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Suleiman Souhlal <suleiman@...gle.com>, Aashish Sharma <shraash@...gle.com>,
Shin Kawamura <kawasin@...gle.com>,
Vineeth Remanan Pillai <vineeth@...byteword.org>
Subject: Re: [PATCH] dl_server: Reset DL server params when rd changes
On 11/8/24 10:30 PM, Waiman Long wrote:
> I have the patchset to enforce that rebuild_sched_domains_locked()
> will only be called at most once per cpuset operation.
>
> By adding some debug code to further study the null total_bw issue
> when cpuset.cpus.partition is being changed, I found that eliminating
> the redundant rebuild_sched_domains_locked() reduced the chance of
> hitting null total_bw, it did not eliminate it. By running my cpuset
> test script, I hit 250 cases of null total_bw with the v6.12-rc6
> kernel. With my new cpuset patch applied, it reduces it to 120 cases
> of null total_bw.
>
> I will try to look further for the exact condition that triggers null
> total_bw generation.
After further testing, the 120 cases of null total_bw can be classified
into the following 3 categories.
1) 51 cases when an isolated partition with isolated CPUs is created.
Isolated CPU is not subjected to scheduling and so a total_bw of 0 is
fine and not really a problem.
2) 67 cases when a nested partitions are being removed (A1 - A2). There
is probably caused by some kind of race condtion. If I insert an
artifical delay between the removal of A2 and A1, total_bw is fine. If
there is no delay, I can see a null total_bw. That shouldn't really a
problem in practice, though we may still need to figure out why.
2) Two cases where null total_bw is seen when a new partition is created
by moving all the CPUs in the parent cgroup into its partition and the
parent becomes a null partition with no CPU. The following example
illustrates the steps.
#!/bin/bash
CGRP=/sys/fs/cgroup
cd $CGRP
echo +cpuset > cgroup.subtree_control
mkdir A1
cd A1
echo 0-1 > cpuset.cpus
echo root > cpuset.cpus.partition
echo "A1 partition"
echo +cpuset > cgroup.subtree_control
mkdir A2
cd A2
echo 0-1 > cpuset.cpus
echo root > cpuset.cpus.partition
echo "A2 partition"
cd ..
echo "Remove A2"
rmdir A2
cd ..
echo "Remove A1"
rmdir A1
In this corner case, there is actually no change in the set of sched
domains. In this case, the sched domain set of CPUs 0-1 is being moved
from partition A1 to A2 and vice versa in the removal of A2. This is
similar to calling rebuild_sched_domains_locked() twice with the same
input. I believe that is the condition that causes null total_bw.
Now the question is why the deadline code behaves this way. It is
probably a bug that needs to be addressed.
Cheers,
Longman
Powered by blists - more mailing lists