lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzTrwJoTetlt2Anj@jlelli-thinkpadt14gen4.remote.csb>
Date: Wed, 13 Nov 2024 18:11:12 +0000
From: Juri Lelli <juri.lelli@...hat.com>
To: Waiman Long <llong@...hat.com>
Cc: Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
	Michal Koutny <mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Qais Yousef <qyousef@...alina.io>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	"Joel Fernandes (Google)" <joel@...lfernandes.org>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Aashish Sharma <shraash@...gle.com>,
	Shin Kawamura <kawasin@...gle.com>,
	Vineeth Remanan Pillai <vineeth@...byteword.org>,
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH 2/2] sched/deadline: Correctly account for allocated
 bandwidth during hotplug

On 13/11/24 11:50, Waiman Long wrote:
> 
> On 11/13/24 11:42 AM, Waiman Long wrote:
> > 
> > On 11/13/24 11:40 AM, Juri Lelli wrote:
> > > On 13/11/24 11:06, Waiman Long wrote:
> > > 
> > > ...
> > > 
> > > > This part can still cause a failure in one of test cases in my cpuset
> > > > partition test script. In this particular case, the CPU to be
> > > > offlined is an
> > > > isolated CPU with scheduling disabled. As a result, total_bw is
> > > > 0 and the
> > > > __dl_overflow() test failed. Is there a way to skip the
> > > > __dl_overflow() test
> > > > for isolated CPUs? Can we use a null total_bw as a proxy for that?
> > > Can you please share the repro script? Would like to check locally what
> > > is going on.
> > 
> > Just run tools/testing/selftests/cgroup/test_cpuset_prs.sh.
> 
> The failing test is
> 
>         # Remote partition offline tests
>         " C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3 X2-3:P2:O2=0 .   0
> A1:0-1,A2:1,A3:3 A1:P0,A3:P2 2-3"
> 
> You can remove all the previous lines in the TEST_MATRIX to get to failed
> test case immediately eliminating unnecessary noise in your testing.

So, IIUC this test is doing the following

# echo +cpuset >cgroup/cgroup.subtree_control
# mkdir cgroup/A1
# echo 0-3 >cgroup/A1/cpuset.cpus
# echo +cpuset >cgroup/A1/cgroup.subtree_control
# mkdir cgroup/A1/A2
# echo 1-3 >cgroup/A1/A2/cpuset.cpus
# echo +cpuset >cgroup/A1/A2/cgroup.subtree_control
# mkdir cgroup/A1/A2/A3
# echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus
# echo 2-3 >cgroup/A1/cpuset.cpus.exclusive
# echo 2-3 >cgroup/A1/A2/cpuset.cpus.exclusive
# echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus.exclusive
# echo isolated >cgroup/A1/A2/A3/cpuset.cpus.partition

With the last command, we get to one root domain with span: 0-1,4-7 (in
my setup with 8 CPUs) and no root domain for 2,3, since they are
isolated.

The test then tries to hotplug CPU 2, but fails to do so and so the
reported error.

total_bw for CPU 2 and CPU 3 is indeed 0, and I guess we could special
case this as you suggest (nothing to really worry about if we don't have
DEADLINE tasks affined to these CPUs). But I would have expected the
fair server contribution to still show up in total_bw, so this is
something a need to check.

Thanks,
Juri


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ