[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <783b4780-72d5-43cf-9193-d26efb61ce0c@huaweicloud.com>
Date: Mon, 12 Jan 2026 10:10:38 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutný
<mkoutny@...e.com>, Jonathan Corbet <corbet@....net>,
Shuah Khan <shuah@...nel.org>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-doc@...r.kernel.org,
Sun Shaojie <sunshaojie@...inos.cn>
Subject: Re: [PATCH cgroup/for-6.20 v3 4/5] cgroup/cpuset: Don't invalidate
sibling partitions on cpuset.cpus conflict
On 2026/1/10 9:32, Waiman Long wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
> the sibling's partition state becomes invalid. This is overly harsh and
> is probably not necessary.
>
> The cpuset.cpus.exclusive control file, if set, will override the
> cpuset.cpus of the same cpuset when creating a cpuset partition.
> So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
> a partition. However, it cannot override a conflicting cpuset.cpus file
> in a sibling cpuset and the partition creation process will fail. This
> is inconsistent. That will also make using cpuset.cpus.exclusive less
> valuable as a tool to set up cpuset partitions as the users have to
> check if such a cpuset.cpus conflict exists or not.
>
> Fix these problems by making sure that once a cpuset.cpus.exclusive
> is set without failure, it will always be allowed to form a valid
> partition as long as at least one CPU can be granted from its parent
> irrespective of the state of the siblings' cpuset.cpus values. Of
> course, setting cpuset.cpus.exclusive will fail if it conflicts with
> the cpuset.cpus.exclusive or the cpuset.cpus.exclusive.effective value
> of a sibling.
>
> Partition can still be created by setting only cpuset.cpus without
> setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's
> cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will
> be removed from its cpuset.cpus.exclusive.effective as long as there
> is still one or more CPUs left and can be granted from its parent. This
> CPU stripping is currently done in rm_siblings_excl_cpus().
>
> The new code will now try its best to enable the creation of new
> partitions with only cpuset.cpus set without invalidating existing ones.
> However it is not guaranteed that all the CPUs requested in cpuset.cpus
> will be used in the new partition even when all these CPUs can be
> granted from the parent.
>
> This is similar to the fact that cpuset.cpus.effective may not be
> able to include all the CPUs requested in cpuset.cpus. In this case,
> the parent may not able to grant all the exclusive CPUs requested in
> cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have
> already been granted to other partitions earlier.
>
> With the creation of multiple sibling partitions by setting
> only cpuset.cpus, this does have the side effect that their exact
> cpuset.cpus.exclusive.effective settings will depend on the order of
> partition creation if there are conflicts. Due to the exclusive nature
> of the CPUs in a partition, it is not easy to make it fair other than
> the old behavior of invalidating all the conflicting partitions.
>
> For example,
> # echo "0-2" > A1/cpuset.cpus
> # echo "root" > A1/cpuset.cpus.partition
> # echo A1/cpuset.cpus.partition
> root
> # echo A1/cpuset.cpus.exclusive.effective
> 0-2
> # echo "2-4" > B1/cpuset.cpus
> # echo "root" > B1/cpuset.cpus.partition
> # echo B1/cpuset.cpus.partition
> root
> # echo B1/cpuset.cpus.exclusive.effective
> 3-4
> # echo B1/cpuset.cpus.effective
> 3-4
>
Shouldn't the interface use cat rather than echo for reading?
Other than that, it looks good to me.
Reviewed-by: Chen Ridong <chenridong@...wei.com>
> For users who want to be sure that they can get most of the CPUs they
> want, cpuset.cpus.exclusive should be used instead if they can set
> it successfully without failure. Setting cpuset.cpus.exclusive will
> guarantee that sibling conflicts from then onward is no longer possible.
>
> To make this change, we have to separate out the is_cpu_exclusive()
> check in cpus_excl_conflict() into a cgroup v1 only
> cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change()
> helper is now no longer needed and can be removed.
>
> Some existing tests in test_cpuset_prs.sh are updated and new ones are
> added to reflect the new behavior. The cgroup-v2.rst doc file is also
> updated the clarify what exclusive CPUs will be used when a partition
> is created.
>
> Reported-by: Sun Shaojie <sunshaojie@...inos.cn>
> Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/
> Signed-off-by: Waiman Long <longman@...hat.com>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 32 +++++---
> kernel/cgroup/cpuset-internal.h | 3 +
> kernel/cgroup/cpuset-v1.c | 19 +++++
> kernel/cgroup/cpuset.c | 80 ++++++-------------
> .../selftests/cgroup/test_cpuset_prs.sh | 26 ++++--
> 5 files changed, 89 insertions(+), 71 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 510df2461aff..a3446db96cea 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -2584,9 +2584,9 @@ Cpuset Interface Files
> of this file will always be a subset of its parent's
> "cpuset.cpus.exclusive.effective" if its parent is not the root
> cgroup. It will also be a subset of "cpuset.cpus.exclusive"
> - if it is set. If "cpuset.cpus.exclusive" is not set, it is
> - treated to have an implicit value of "cpuset.cpus" in the
> - formation of local partition.
> + if it is set. This file should only be non-empty if either
> + "cpuset.cpus.exclusive" is set or when the current cpuset is
> + a valid partition root.
>
> cpuset.cpus.isolated
> A read-only and root cgroup only multiple values file.
> @@ -2618,13 +2618,22 @@ Cpuset Interface Files
> There are two types of partitions - local and remote. A local
> partition is one whose parent cgroup is also a valid partition
> root. A remote partition is one whose parent cgroup is not a
> - valid partition root itself. Writing to "cpuset.cpus.exclusive"
> - is optional for the creation of a local partition as its
> - "cpuset.cpus.exclusive" file will assume an implicit value that
> - is the same as "cpuset.cpus" if it is not set. Writing the
> - proper "cpuset.cpus.exclusive" values down the cgroup hierarchy
> - before the target partition root is mandatory for the creation
> - of a remote partition.
> + valid partition root itself.
> +
> + Writing to "cpuset.cpus.exclusive" is optional for the creation
> + of a local partition as its "cpuset.cpus.exclusive" file will
> + assume an implicit value that is the same as "cpuset.cpus" if it
> + is not set. Writing the proper "cpuset.cpus.exclusive" values
> + down the cgroup hierarchy before the target partition root is
> + mandatory for the creation of a remote partition.
> +
> + Not all the CPUs requested in "cpuset.cpus.exclusive" can be
> + used to form a new partition. Only those that were present
> + in its parent's "cpuset.cpus.exclusive.effective" control
> + file can be used. For partitions created without setting
> + "cpuset.cpus.exclusive", exclusive CPUs specified in sibling's
> + "cpuset.cpus.exclusive" or "cpuset.cpus.exclusive.effective"
> + also cannot be used.
>
> Currently, a remote partition cannot be created under a local
> partition. All the ancestors of a remote partition root except
> @@ -2632,6 +2641,9 @@ Cpuset Interface Files
>
> The root cgroup is always a partition root and its state cannot
> be changed. All other non-root cgroups start out as "member".
> + Even though the "cpuset.cpus.exclusive*" control files are not
> + present in the root cgroup, they are implicitly the same as
> + "cpuset.cpus".
>
> When set to "root", the current cgroup is the root of a new
> partition or scheduling domain. The set of exclusive CPUs is
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index e718a4f54360..e8e2683cb067 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -312,6 +312,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
> struct cpumask *new_cpus, nodemask_t *new_mems,
> bool cpus_updated, bool mems_updated);
> int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
> +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2);
> void cpuset1_init(struct cpuset *cs);
> void cpuset1_online_css(struct cgroup_subsys_state *css);
> int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> @@ -326,6 +327,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
> bool cpus_updated, bool mems_updated) {}
> static inline int cpuset1_validate_change(struct cpuset *cur,
> struct cpuset *trial) { return 0; }
> +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
> + struct cpuset *cs2) { return false; }
> static inline void cpuset1_init(struct cpuset *cs) {}
> static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
> static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index ecfea7800f0d..04124c38a774 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
> return ret;
> }
>
> +/*
> + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
> + * to legacy (v1)
> + * @cs1: first cpuset to check
> + * @cs2: second cpuset to check
> + *
> + * Returns: true if CPU exclusivity conflict exists, false otherwise
> + *
> + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
> + */
> +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
> +{
> + if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> + return cpumask_intersects(cs1->cpus_allowed,
> + cs2->cpus_allowed);
> +
> + return false;
> +}
> +
> #ifdef CONFIG_PROC_PID_CPUSET
> /*
> * proc_cpuset_show()
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 4819ab429771..83fb83a86b4b 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -129,6 +129,17 @@ static bool force_sd_rebuild;
> * For simplicity, a local partition can be created under a local or remote
> * partition but a remote partition cannot have any partition root in its
> * ancestor chain except the cgroup root.
> + *
> + * A valid partition can be formed by setting exclusive_cpus or cpus_allowed
> + * if exclusive_cpus is not set. In the case of partition with empty
> + * exclusive_cpus, all the conflicting exclusive CPUs specified in the
> + * following cpumasks of sibling cpusets will be removed from its
> + * cpus_allowed in determining its effective_xcpus.
> + * - effective_xcpus
> + * - exclusive_cpus
> + *
> + * The "cpuset.cpus.exclusive" control file should be used for setting up
> + * partition if the users want to get as many CPUs as possible.
> */
> #define PRS_MEMBER 0
> #define PRS_ROOT 1
> @@ -616,27 +627,25 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
> * Returns: true if CPU exclusivity conflict exists, false otherwise
> *
> * Conflict detection rules:
> - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
> - * 2. exclusive_cpus masks cannot intersect between cpusets
> - * 3. The allowed CPUs of a sibling cpuset cannot be a subset of the new exclusive CPUs
> + * o cgroup v1
> + * See cpuset1_cpus_excl_conflict()
> + * o cgroup v2
> + * - The exclusive_cpus values cannot overlap.
> + * - New exclusive_cpus cannot be a superset of a sibling's cpus_allowed.
> */
> static inline bool cpus_excl_conflict(struct cpuset *trial, struct cpuset *sibling,
> bool xcpus_changed)
> {
> - /* If either cpuset is exclusive, check if they are mutually exclusive */
> - if (is_cpu_exclusive(trial) || is_cpu_exclusive(sibling))
> - return !cpusets_are_exclusive(trial, sibling);
> -
> - /* Exclusive_cpus cannot intersect */
> - if (cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus))
> - return true;
> + if (!cpuset_v2())
> + return cpuset1_cpus_excl_conflict(trial, sibling);
>
> /* The cpus_allowed of a sibling cpuset cannot be a subset of the new exclusive_cpus */
> if (xcpus_changed && !cpumask_empty(sibling->cpus_allowed) &&
> cpumask_subset(sibling->cpus_allowed, trial->exclusive_cpus))
> return true;
>
> - return false;
> + /* Exclusive_cpus cannot intersect */
> + return cpumask_intersects(trial->exclusive_cpus, sibling->exclusive_cpus);
> }
>
> static inline bool mems_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
> @@ -2312,43 +2321,6 @@ static enum prs_errcode validate_partition(struct cpuset *cs, struct cpuset *tri
> return PERR_NONE;
> }
>
> -static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialcs,
> - struct tmpmasks *tmp)
> -{
> - int retval;
> - struct cpuset *parent = parent_cs(cs);
> -
> - retval = validate_change(cs, trialcs);
> -
> - if ((retval == -EINVAL) && cpuset_v2()) {
> - struct cgroup_subsys_state *css;
> - struct cpuset *cp;
> -
> - /*
> - * The -EINVAL error code indicates that partition sibling
> - * CPU exclusivity rule has been violated. We still allow
> - * the cpumask change to proceed while invalidating the
> - * partition. However, any conflicting sibling partitions
> - * have to be marked as invalid too.
> - */
> - trialcs->prs_err = PERR_NOTEXCL;
> - rcu_read_lock();
> - cpuset_for_each_child(cp, css, parent) {
> - struct cpumask *xcpus = user_xcpus(trialcs);
> -
> - if (is_partition_valid(cp) &&
> - cpumask_intersects(xcpus, cp->effective_xcpus)) {
> - rcu_read_unlock();
> - update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
> - rcu_read_lock();
> - }
> - }
> - rcu_read_unlock();
> - retval = 0;
> - }
> - return retval;
> -}
> -
> /**
> * partition_cpus_change - Handle partition state changes due to CPU mask updates
> * @cs: The target cpuset being modified
> @@ -2408,15 +2380,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed))
> return 0;
>
> - if (alloc_tmpmasks(&tmp))
> - return -ENOMEM;
> -
> compute_trialcs_excpus(trialcs, cs);
> trialcs->prs_err = PERR_NONE;
>
> - retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
> + retval = validate_change(cs, trialcs);
> if (retval < 0)
> - goto out_free;
> + return retval;
> +
> + if (alloc_tmpmasks(&tmp))
> + return -ENOMEM;
>
> /*
> * Check all the descendants in update_cpumasks_hier() if
> @@ -2439,7 +2411,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
> /* Update CS_SCHED_LOAD_BALANCE and/or sched_domains, if necessary */
> if (cs->partition_root_state)
> update_partition_sd_lb(cs, old_prs);
> -out_free:
> +
> free_tmpmasks(&tmp);
> return retval;
> }
> diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> index a17256d9f88a..ff4540b0490e 100755
> --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> @@ -269,7 +269,7 @@ TEST_MATRIX=(
> " C0-3:S+ C1-3:S+ C2-3 . X2-3 X3:P2 . . 0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
> " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
> " C0-3:S+ C1-3:S+ C2-3 . X2-3 X2-3 X2-3:P2:C3 . 0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
> - " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
> + " C0-3:S+ C1-3:S+ C2-3 C2-3 . . . P2 0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2"
> " C0-3:S+ C1-3:S+ C2-3 C4-5 . . . P2 0 B1:4-5 B1:P2 4-5"
> " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
> " C0-3:S+ C1-3:S+ C2-3 C4 X2-3 X2-3 X2-3:P2:C1-3 P2 0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
> @@ -318,7 +318,7 @@ TEST_MATRIX=(
> # Invalid to valid local partition direct transition tests
> " C1-3:S+:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
> " C1-3:S+:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
> - " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
> + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0"
> " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
>
> # Local partition invalidation tests
> @@ -388,10 +388,10 @@ TEST_MATRIX=(
> " C0-1:S+ C1 . C2-3 . P2 . . 0 A1:0-1|A2:1 A1:P0|A2:P-2"
> " C0-1:S+ C1:P2 . C2-3 P1 . . . 0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
>
> - # A non-exclusive cpuset.cpus change will invalidate partition and its siblings
> - " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
> - " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
> - " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
> + # A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
> + " C0-1:P1 . . C2-3 C0-2 . . . 0 A1:0-2|B1:3 A1:P1|B1:P0"
> + " C0-1:P1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|XA1:0-1|B1:2-3 A1:P1|B1:P1"
> + " C0-1 . . P1:C2-3 C0-2 . . . 0 A1:0-1|B1:2-3 A1:P0|B1:P1"
>
> # cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
> " C0-3 . . C4-5 X5 . . . 0 A1:0-3|B1:4-5"
> @@ -417,6 +417,14 @@ TEST_MATRIX=(
> " CX1-4:S+ CX2-4:P2 . C5-6 . . . P1:C3-6 0 A1:1|A2:2-4|B1:5-6 \
> A1:P0|A2:P2:B1:P-1 2-4"
>
> + # When multiple partitions with conflicting cpuset.cpus are created, the
> + # latter created ones will only get what are left of the available exclusive
> + # CPUs.
> + " C1-3:P1 . . . . . . C3-5:P1 0 A1:1-3|B1:4-5:XB1:4-5 A1:P1|B1:P1"
> +
> + # cpuset.cpus can be set to a subset of sibling's cpuset.cpus.exclusive
> + " C1-3:X1-3 . . C4-5 . . . C1-2 0 A1:1-3|B1:1-2"
> +
> # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
> # ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
> # Failure cases:
> @@ -427,7 +435,7 @@ TEST_MATRIX=(
> # Changes to cpuset.cpus.exclusive that violate exclusivity rule is rejected
> " C0-3 . . C4-5 X0-3 . . X3-5 1 A1:0-3|B1:4-5"
>
> - # cpuset.cpus cannot be a subset of sibling cpuset.cpus.exclusive
> + # cpuset.cpus.exclusive cannot be set to a superset of sibling's cpuset.cpus
> " C0-3 . . C4-5 X3-5 . . . 1 A1:0-3|B1:4-5"
> )
>
> @@ -477,6 +485,10 @@ REMOTE_TEST_MATRIX=(
> . . X1-2:P2 X4-5:P1 . X1-7:P2 p1:3|c11:1-2|c12:4:c22:5-6 \
> p1:P0|p2:P1|c11:P2|c12:P1|c22:P2 \
> 1-2,4-6|1-2,5-6"
> + # c12 whose cpuset.cpus CPUs are all granted to c11 will become invalid partition
> + " C1-5:P1:S+ . C1-4:P1 C2-3 . . \
> + . . . P1 . . p1:5|c11:1-4|c12:5 \
> + p1:P1|c11:P1|c12:P-1"
> )
>
> #
--
Best regards,
Ridong
Powered by blists - more mailing lists