[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZzsEpyU99iRLvK_3@jlelli-thinkpadt14gen4.remote.csb>
Date: Mon, 18 Nov 2024 10:11:03 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Waiman Long <llong@...hat.com>
Cc: Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
Michal Koutny <mkoutny@...e.com>, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org
Subject: Re: Additional issue with cpuset isolated partitions?
On 15/11/24 12:47, Waiman Long wrote:
> On 11/15/24 11:30 AM, Juri Lelli wrote:
> > Hello,
> >
> > While working on the recent cpuset/deadline fixes [1], I encountered
> > what looks like an issue to me. What I'm doing is (based on one of the
> > tests of test_cpuset_prs.sh):
> >
> > # echo Y >/sys/kernel/debug/sched/verbose
> > # echo +cpuset >cgroup/cgroup.subtree_control
> > # mkdir cgroup/A1
> > # echo 0-3 >cgroup/A1/cpuset.cpus
> > # echo +cpuset >cgroup/A1/cgroup.subtree_control
> > # mkdir cgroup/A1/A2
> > # echo 1-3 >cgroup/A1/A2/cpuset.cpus
> > # echo +cpuset >cgroup/A1/A2/cgroup.subtree_control
> > # mkdir cgroup/A1/A2/A3
> > # echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus
> > # echo 2-3 >cgroup/A1/cpuset.cpus.exclusive
> > # echo 2-3 >cgroup/A1/A2/cpuset.cpus.exclusive
> > # echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus.exclusive
> > # echo isolated >cgroup/A1/A2/A3/cpuset.cpus.partition
> >
> > and with this, on my 8 CPUs system, I correctly get a root domain for
> > 0-1,4-7 and 2,3 are left isolated (attached to default root domain).
> >
> > I now put the shell into the A1/A2/A3 cpuset
> >
> > # echo $$ >cgroup/A1/A2/A3/cgroup.procs
> >
> > and hotplug CPU 2,3
> >
> > # echo 0 >/sys/devices/system/cpu/cpu2/online
> > # echo 0 >/sys/devices/system/cpu/cpu3/online
> >
> > guess the shell is moved to the non-isolated domain. So far so good
> > then, only that if I turn CPUs 2,3 back on they are attached to the root
> > domain containing the non-isolated cpus
> A valid partition must have CPUs associated with it. If no CPU is available,
> it becomes invalid and fall back to use the CPUs from the parent cgroup.
Hummm, OK. But, if I don't put any process in the partition the behavior
is different, in that the partition still reads as correctly isolated
and CPUs are not moved to the root domain after hotplug, i.e.,
# echo 0 >/sys/devices/system/cpu/cpu2/online
# echo 0 >/sys/devices/system/cpu/cpu3/online
# cat cgroup/A1/A2/A3/cpuset.cpus.partition
isolated
# echo 1 >/sys/devices/system/cpu/cpu2/online
# echo 1 >/sys/devices/system/cpu/cpu3/online
# cat cgroup/A1/A2/A3/cpuset.cpus.partition
isolated
This is what puzzled me, the difference in behavior w/ or w/o a process
in the cgroup.
> > # echo 1 >/sys/devices/system/cpu/cpu2/online
> > ...
> > [ 990.133593] root domain span: 0-2,4-7
> > [ 990.134480] rd 0-2,4-7
> >
> > # echo 1 >/sys/devices/system/cpu/cpu3/online
> > ...
> > [ 1082.858992] root domain span: 0-7
> > [ 1082.859530] rd 0-7
> >
> > And now the A1/A2/A3 partition is not valid anymore
> >
> > # cat cgroup/A1/A2/A3/cpuset.cpus.partition
> > isolated invalid (Invalid cpu list in cpuset.cpus.exclusive)
> >
> > Is this expected? It looks like one need to put at least one process in
> > the partition before hotplugging its cpus for the above to reproduce
> > (hotpluging w/o processes involved leaves CPUs 2,3 in the default domain
> > and isolated).
>
> Once a partition becomes invalid, there is no self recovery if the CPUs
> become online again. Users have to explicitly re-enable it. It is really a
> very rare case and so we don't spend effort to do that.
>
> If only one of 2 CPUs are offline and then online again, the full 2-CPU
> isolated partition can be recovered.
>
> Please let me know if you have further question.
I see the point, but please see above my only remaining question. :)
Thanks,
Juri
Powered by blists - more mailing lists