[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzTrwJoTetlt2Anj@jlelli-thinkpadt14gen4.remote.csb>
Date: Wed, 13 Nov 2024 18:11:12 +0000
From: Juri Lelli <juri.lelli@...hat.com>
To: Waiman Long <llong@...hat.com>
Cc: Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
Michal Koutny <mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Qais Yousef <qyousef@...alina.io>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
"Joel Fernandes (Google)" <joel@...lfernandes.org>,
Suleiman Souhlal <suleiman@...gle.com>,
Aashish Sharma <shraash@...gle.com>,
Shin Kawamura <kawasin@...gle.com>,
Vineeth Remanan Pillai <vineeth@...byteword.org>,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH 2/2] sched/deadline: Correctly account for allocated
bandwidth during hotplug
On 13/11/24 11:50, Waiman Long wrote:
>
> On 11/13/24 11:42 AM, Waiman Long wrote:
> >
> > On 11/13/24 11:40 AM, Juri Lelli wrote:
> > > On 13/11/24 11:06, Waiman Long wrote:
> > >
> > > ...
> > >
> > > > This part can still cause a failure in one of test cases in my cpuset
> > > > partition test script. In this particular case, the CPU to be
> > > > offlined is an
> > > > isolated CPU with scheduling disabled. As a result, total_bw is
> > > > 0 and the
> > > > __dl_overflow() test failed. Is there a way to skip the
> > > > __dl_overflow() test
> > > > for isolated CPUs? Can we use a null total_bw as a proxy for that?
> > > Can you please share the repro script? Would like to check locally what
> > > is going on.
> >
> > Just run tools/testing/selftests/cgroup/test_cpuset_prs.sh.
>
> The failing test is
>
> # Remote partition offline tests
> " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:O2=0 . 0
> A1:0-1,A2:1,A3:3 A1:P0,A3:P2 2-3"
>
> You can remove all the previous lines in the TEST_MATRIX to get to failed
> test case immediately eliminating unnecessary noise in your testing.
So, IIUC this test is doing the following
# echo +cpuset >cgroup/cgroup.subtree_control
# mkdir cgroup/A1
# echo 0-3 >cgroup/A1/cpuset.cpus
# echo +cpuset >cgroup/A1/cgroup.subtree_control
# mkdir cgroup/A1/A2
# echo 1-3 >cgroup/A1/A2/cpuset.cpus
# echo +cpuset >cgroup/A1/A2/cgroup.subtree_control
# mkdir cgroup/A1/A2/A3
# echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus
# echo 2-3 >cgroup/A1/cpuset.cpus.exclusive
# echo 2-3 >cgroup/A1/A2/cpuset.cpus.exclusive
# echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus.exclusive
# echo isolated >cgroup/A1/A2/A3/cpuset.cpus.partition
With the last command, we get to one root domain with span: 0-1,4-7 (in
my setup with 8 CPUs) and no root domain for 2,3, since they are
isolated.
The test then tries to hotplug CPU 2, but fails to do so and so the
reported error.
total_bw for CPU 2 and CPU 3 is indeed 0, and I guess we could special
case this as you suggest (nothing to really worry about if we don't have
DEADLINE tasks affined to these CPUs). But I would have expected the
fair server contribution to still show up in total_bw, so this is
something a need to check.
Thanks,
Juri
Powered by blists - more mailing lists