[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bfbedc6a-9f04-472f-afe9-828efe0387e6@redhat.com>
Date: Fri, 15 Nov 2024 12:47:18 -0500
From: Waiman Long <llong@...hat.com>
To: Juri Lelli <juri.lelli@...hat.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutny <mkoutny@...e.com>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: Additional issue with cpuset isolated partitions?
On 11/15/24 11:30 AM, Juri Lelli wrote:
> Hello,
>
> While working on the recent cpuset/deadline fixes [1], I encountered
> what looks like an issue to me. What I'm doing is (based on one of the
> tests of test_cpuset_prs.sh):
>
> # echo Y >/sys/kernel/debug/sched/verbose
> # echo +cpuset >cgroup/cgroup.subtree_control
> # mkdir cgroup/A1
> # echo 0-3 >cgroup/A1/cpuset.cpus
> # echo +cpuset >cgroup/A1/cgroup.subtree_control
> # mkdir cgroup/A1/A2
> # echo 1-3 >cgroup/A1/A2/cpuset.cpus
> # echo +cpuset >cgroup/A1/A2/cgroup.subtree_control
> # mkdir cgroup/A1/A2/A3
> # echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus
> # echo 2-3 >cgroup/A1/cpuset.cpus.exclusive
> # echo 2-3 >cgroup/A1/A2/cpuset.cpus.exclusive
> # echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus.exclusive
> # echo isolated >cgroup/A1/A2/A3/cpuset.cpus.partition
>
> and with this, on my 8 CPUs system, I correctly get a root domain for
> 0-1,4-7 and 2,3 are left isolated (attached to default root domain).
>
> I now put the shell into the A1/A2/A3 cpuset
>
> # echo $$ >cgroup/A1/A2/A3/cgroup.procs
>
> and hotplug CPU 2,3
>
> # echo 0 >/sys/devices/system/cpu/cpu2/online
> # echo 0 >/sys/devices/system/cpu/cpu3/online
>
> guess the shell is moved to the non-isolated domain. So far so good
> then, only that if I turn CPUs 2,3 back on they are attached to the root
> domain containing the non-isolated cpus
A valid partition must have CPUs associated with it. If no CPU is
available, it becomes invalid and fall back to use the CPUs from the
parent cgroup.
>
> # echo 1 >/sys/devices/system/cpu/cpu2/online
> ...
> [ 990.133593] root domain span: 0-2,4-7
> [ 990.134480] rd 0-2,4-7
>
> # echo 1 >/sys/devices/system/cpu/cpu3/online
> ...
> [ 1082.858992] root domain span: 0-7
> [ 1082.859530] rd 0-7
>
> And now the A1/A2/A3 partition is not valid anymore
>
> # cat cgroup/A1/A2/A3/cpuset.cpus.partition
> isolated invalid (Invalid cpu list in cpuset.cpus.exclusive)
>
> Is this expected? It looks like one need to put at least one process in
> the partition before hotplugging its cpus for the above to reproduce
> (hotpluging w/o processes involved leaves CPUs 2,3 in the default domain
> and isolated).
Once a partition becomes invalid, there is no self recovery if the CPUs
become online again. Users have to explicitly re-enable it. It is really
a very rare case and so we don't spend effort to do that.
If only one of 2 CPUs are offline and then online again, the full 2-CPU
isolated partition can be recovered.
Please let me know if you have further question.
Cheers,
Longman
Powered by blists - more mailing lists