[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <chijw6gvtql74beputm3ue2zu2vmrwvtg5a2bn3wabgkqldq4d@obrdh4znejaw>
Date: Thu, 8 Jan 2026 20:04:04 +0100
From: Michal Koutný <mkoutny@...e.com>
To: Waiman Long <longman@...hat.com>
Cc: Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, linux-kselftest@...r.kernel.org, linux-doc@...r.kernel.org,
Sun Shaojie <sunshaojie@...inos.cn>, Chen Ridong <chenridong@...weicloud.com>
Subject: Re: [cgroup/for-6.20 PATCH v2 4/4] cgroup/cpuset: Don't invalidate
sibling partitions on cpuset.cpus conflict
Hi.
On Thu, Jan 01, 2026 at 02:15:58PM -0500, Waiman Long <longman@...hat.com> wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
> the sibling's partition state becomes invalid. This is overly harsh and
> is probably not necessary.
>
> The cpuset.cpus.exclusive control file, if set, will override the
> cpuset.cpus of the same cpuset when creating a cpuset partition.
> So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
> a partition. However, it cannot override a conflicting cpuset.cpus file
> in a sibling cpuset and the partition creation process will fail. This
> is inconsistent. That will also make using cpuset.cpus.exclusive less
> valuable as a tool to set up cpuset partitions as the users have to
> check if such a cpuset.cpus conflict exists or not.
>
> Fix these problems by strictly adhering to the setting of the
> following control files in descending order of priority when setting
> up a partition.
>
> 1. cpuset.cpus.exclusive.effective of a valid partition
> 2. cpuset.cpus.exclusive
> 3. cpuset.cpus
>
> So once a cpuset.cpus.exclusive is set without failure, it will
> always be allowed to form a valid partition as long as at least one
> CPU can be granted from its parent irrespective of the state of the
> siblings' cpuset.cpus values. Of course, setting cpuset.cpus.exclusive
> will fail if it conflicts with the cpuset.cpus.exclusive or the
> cpuset.cpus.exclusive.effective value of a sibling.
Concept question:
When a/b/cpuset.cpus.exclusive ⊂ a/b/cpuset.cpus (proper subset)
and a/b/cpuset.cpus.partition == root, a/cpuset.cpus.partition == root
(b is valid partition)
should a/b/cpuset.cpus.exclusive.effective be equal to cpuset.cpus (as
all of them happen to be exclusive) or "only" cpuset.cpus.exclusive?
> Partition can still be created by setting only cpuset.cpus without
> setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's
> cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will
> be removed from its cpuset.cpus.exclusive.effective as long as there
> is still one or more CPUs left and can be granted from its parent. This
> CPU stripping is currently done in rm_siblings_excl_cpus().
>
> The new code will now try its best to enable the creation of new
> partitions with only cpuset.cpus set without invalidating existing ones.
OK. (After I re-learnt benefits of remote partitions or more precisely
cpuset.cpus.effective.)
> However it is not guaranteed that all the CPUs requested in cpuset.cpus
> will be used in the new partition even when all these CPUs can be
> granted from the parent.
>
> This is similar to the fact that cpuset.cpus.effective may not be
> able to include all the CPUs requested in cpuset.cpus. In this case,
> the parent may not able to grant all the exclusive CPUs requested in
> cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have
> already been granted to other partitions earlier.
>
> With the creation of multiple sibling partitions by setting
> only cpuset.cpus, this does have the side effect that their exact
> cpuset.cpus.exclusive.effective settings will depend on the order of
> partition creation if there are conflicts. Due to the exclusive nature
> of the CPUs in a partition, it is not easy to make it fair other than
> the old behavior of invalidating all the conflicting partitions.
>
> For example,
> # echo "0-2" > A1/cpuset.cpus
> # echo "root" > A1/cpuset.cpus.partition
> # echo A1/cpuset.cpus.partition
> root
> # echo A1/cpuset.cpus.exclusive.effective
> 0-2
> # echo "2-4" > B1/cpuset.cpus
> # echo "root" > B1/cpuset.cpus.partition
> # echo B1/cpuset.cpus.partition
> root
> # echo B1/cpuset.cpus.exclusive.effective
> 3-4
> # echo B1/cpuset.cpus.effective
> 3-4
>
> For users who want to be sure that they can get most of the CPUs they
> want,
Slightly OT but I'd say that users want:
a) confinement (some cpuset.cpus in leaves)
b) isolation (cpuset.cpus.exclusive in leaves)
c) hierarchical organization
- confinment generalizes OK
- children can only claim what parent allowed
Conflicting exclusivity configs should be no users intention or a want :-p
> cpuset.cpus.exclusive should be used instead if they can set
> it successfully without failure. Setting cpuset.cpus.exclusive will
> guarantee that sibling conflicts from then onward is no longer possible.
I think the background idea of the paragraph (shift away from local to
remote partitions, also mentioned the other day) could be somehow fitted
into the Documentation/ hunks.
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> ...
> @@ -2632,6 +2641,9 @@ Cpuset Interface Files
>
> The root cgroup is always a partition root and its state cannot
> be changed. All other non-root cgroups start out as "member".
> + Even though the "cpuset.cpus.exclusive*" control files are not
> + present in the root cgroup, they are implicitly the same as
> + "cpuset.cpus".
Even "cpuset.cpus" have CFTYPE_NOT_ON_ROOT, so this formulation might be
confusing. Maybe it's same as "cpuset.cpus.effective"?
Thanks,
Michal
Download attachment "signature.asc" of type "application/pgp-signature" (266 bytes)
Powered by blists - more mailing lists