lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a188bfee-7420-4195-a813-2739f3689ba3@redhat.com>
Date: Thu, 25 Dec 2025 02:30:15 -0500
From: Waiman Long <llong@...hat.com>
To: Sun Shaojie <sunshaojie@...inos.cn>, llong@...hat.com
Cc: cgroups@...r.kernel.org, chenridong@...weicloud.com, hannes@...xchg.org,
 linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
 mkoutny@...e.com, shuah@...nel.org, tj@...nel.org
Subject: Re: [PATCH v6] cpuset: Avoid invalidating sibling partitions on
 cpuset.cpus conflict.

On 12/1/25 4:38 AM, Sun Shaojie wrote:
> Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
> with its sibling partition, the sibling's partition state becomes invalid.
> However, this invalidation is often unnecessary.
>
> For example: On a machine with 128 CPUs, there are m (m < 128) cpusets
> under the root cgroup. Each cpuset is used by a single user(user-1 use
> A1, ... , user-m use Am), and the partition states of these cpusets are
> configured as follows:
>
>                             root cgroup
>          /             /                  \                 \
>         A1            A2        ...       An                Am
>       (root)        (root)      ...     (root) (root/root invalid/member)
>
> Assume that A1 through Am have not set cpuset.cpus.exclusive. When
> user-m modifies Am's cpuset.cpus to "0-127", it will cause all partition
> states from A1 to An to change from root to root invalid, as shown
> below.
>
>                             root cgroup
>          /              /                 \                 \
>         A1             A2       ...       An                Am
>   (root invalid) (root invalid) ... (root invalid) (root invalid/member)
>
> This outcome is entirely undeserved for all users from A1 to An.
>
> This patch prevents such outcomes by ensuring that modifications to
> cpuset.cpus do not affect the partition state of other sibling cpusets.
> Therefore, with this patch applied, when user-m configures Am's
> cpuset.cpus to "0-127", the result will be as follows.
>
>                             root cgroup
>          /             /                  \                 \
>         A1            A2        ...       An                Am
>       (root)        (root)      ...     (root)     (root invalid/member)
>
> It is worth noting that, since this patch enforces the exclusivity of
> sibling cpusets, setting exclusivity now follows a "first-come,
> first-served" principle.
>
> For example, consider the following four steps: before applying this
> patch, regardless of the order in which they are executed, the final
> partition state of both A1 and B1 would always be "root invalid."
>
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "1-2" > B1/cpuset.cpus            | root invalid | member       |
>   #4> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |
>
> After applying this patch, the first party to set "root" will maintain
> its exclusive validity. As follows:
>
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > A1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > A1/cpuset.cpus.partition | root         | member       |
>   #3> echo "1-2" > B1/cpuset.cpus            | root         | member       |
>   #4> echo "root" > B1/cpuset.cpus.partition | root         | root invalid |
>
>   Step                                       | A1's prstate | B1's prstate |
>   #1> echo "0-1" > B1/cpuset.cpus            | member       | member       |
>   #2> echo "root" > B1/cpuset.cpus.partition | member       | root         |
>   #3> echo "1-2" > A1/cpuset.cpus            | member       | root         |
>   #4> echo "root" > A1/cpuset.cpus.partition | root invalid | root         |
>
> In summary, if the current cpuset conflicts with its sibling cpusets on
> exclusive CPUs (If a cpuset is exclusive and its exclusive CPUs are empty,
> its allowed CPUs will be treated as exclusive CPUs), only the current
> cpuset should bear the consequences.
>
> Signed-off-by: Sun Shaojie <sunshaojie@...inos.cn>

I agree with you that it is probably not a good idea to invalidate 
partitions whenever there is a conflict. However, I have a different 
idea of how to do it. I am going to post another patch to show my idea. 
Let me know what you think about it and whether it can meet your need.

Cheers,
Longman

> ---
>   kernel/cgroup/cpuset-internal.h               |  3 +
>   kernel/cgroup/cpuset-v1.c                     | 19 ++++++
>   kernel/cgroup/cpuset.c                        | 60 ++++++++++++-------
>   .../selftests/cgroup/test_cpuset_prs.sh       | 12 ++--
>   4 files changed, 65 insertions(+), 29 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index 337608f408ce..c53111998432 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -292,6 +292,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>   			    struct cpumask *new_cpus, nodemask_t *new_mems,
>   			    bool cpus_updated, bool mems_updated);
>   int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
> +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2);
>   #else
>   static inline void fmeter_init(struct fmeter *fmp) {}
>   static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
> @@ -302,6 +303,8 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>   			    bool cpus_updated, bool mems_updated) {}
>   static inline int cpuset1_validate_change(struct cpuset *cur,
>   				struct cpuset *trial) { return 0; }
> +static inline bool cpuset1_cpus_excl_conflict(struct cpuset *cs1,
> +				struct cpuset *cs2) {return false; }
>   #endif /* CONFIG_CPUSETS_V1 */
>   
>   #endif /* __CPUSET_INTERNAL_H */
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index 12e76774c75b..5aa0ac092ef6 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -373,6 +373,25 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
>   	return ret;
>   }
>   
> +/*
> + * cpuset1_cpus_excl_conflict() - Check if two cpusets have exclusive CPU conflicts
> + *                                to legacy (v1)
> + * @cs1: first cpuset to check
> + * @cs2: second cpuset to check
> + *
> + * Returns: true if CPU exclusivity conflict exists, false otherwise
> + *
> + * If either cpuset is CPU exclusive, their allowed CPUs cannot intersect.
> + */
> +bool cpuset1_cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
> +{
> +	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> +		return cpumask_intersects(cs1->cpus_allowed,
> +					  cs2->cpus_allowed);
> +
> +	return false;
> +}
> +
>   #ifdef CONFIG_PROC_PID_CPUSET
>   /*
>    * proc_cpuset_show()
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 52468d2c178a..e58dd26e074a 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -586,14 +586,24 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
>    * Returns: true if CPU exclusivity conflict exists, false otherwise
>    *
>    * Conflict detection rules:
> - * 1. If either cpuset is CPU exclusive, they must be mutually exclusive
> + * For cgroup-v1:
> + *     see cpuset1_cpus_excl_conflict()
> + * For cgroup-v2:
> + * 1. If both cs1 and cs2 are exclusive, cs1 and cs2 must be mutually exclusive
>    * 2. exclusive_cpus masks cannot intersect between cpusets
>    * 3. The allowed CPUs of one cpuset cannot be a subset of another's exclusive CPUs
> + * 4. If a cpuset is exclusive and its exclusive CPUs are empty, its allowed CPUs
> + *    will be treated as exclusive CPUs; therefore, its allowed CPUs must not
> + *    intersect with another's exclusive CPUs.
>    */
>   static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
>   {
> -	/* If either cpuset is exclusive, check if they are mutually exclusive */
> -	if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> +	/* For cgroup-v1 */
> +	if (!cpuset_v2())
> +		return cpuset1_cpus_excl_conflict(cs1, cs2);
> +
> +	/* If cpusets are exclusive, check if they are mutually exclusive*/
> +	if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>   		return !cpusets_are_exclusive(cs1, cs2);
>   
>   	/* Exclusive_cpus cannot intersect */
> @@ -609,6 +619,20 @@ static inline bool cpus_excl_conflict(struct cpuset *cs1, struct cpuset *cs2)
>   	    cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
>   		return true;
>   
> +	/*
> +	 * When a cpuset is exclusive and its exclusive CPUs are empty,
> +	 * its cpus_allowed cannot intersect with another cpuset's exclusive_cpus.
> +	 */
> +	if (is_cpu_exclusive(cs1) &&
> +	    cpumask_empty(cs1->exclusive_cpus) &&
> +	    cpumask_intersects(cs1->cpus_allowed, cs2->exclusive_cpus))
> +		return true;
> +
> +	if (is_cpu_exclusive(cs2) &&
> +	    cpumask_empty(cs2->exclusive_cpus) &&
> +	    cpumask_intersects(cs2->cpus_allowed, cs1->exclusive_cpus))
> +		return true;
> +
>   	return false;
>   }
>   
> @@ -2411,34 +2435,17 @@ static int cpus_allowed_validate_change(struct cpuset *cs, struct cpuset *trialc
>   					struct tmpmasks *tmp)
>   {
>   	int retval;
> -	struct cpuset *parent = parent_cs(cs);
>   
>   	retval = validate_change(cs, trialcs);
>   
>   	if ((retval == -EINVAL) && cpuset_v2()) {
> -		struct cgroup_subsys_state *css;
> -		struct cpuset *cp;
> -
>   		/*
>   		 * The -EINVAL error code indicates that partition sibling
>   		 * CPU exclusivity rule has been violated. We still allow
>   		 * the cpumask change to proceed while invalidating the
> -		 * partition. However, any conflicting sibling partitions
> -		 * have to be marked as invalid too.
> +		 * partition.
>   		 */
>   		trialcs->prs_err = PERR_NOTEXCL;
> -		rcu_read_lock();
> -		cpuset_for_each_child(cp, css, parent) {
> -			struct cpumask *xcpus = user_xcpus(trialcs);
> -
> -			if (is_partition_valid(cp) &&
> -			    cpumask_intersects(xcpus, cp->effective_xcpus)) {
> -				rcu_read_unlock();
> -				update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, tmp);
> -				rcu_read_lock();
> -			}
> -		}
> -		rcu_read_unlock();
>   		retval = 0;
>   	}
>   	return retval;
> @@ -2506,8 +2513,15 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
>   	if (alloc_tmpmasks(&tmp))
>   		return -ENOMEM;
>   
> -	compute_trialcs_excpus(trialcs, cs);
> -	trialcs->prs_err = PERR_NONE;
> +	/*
> +	 * if there is exclusive CPUs conflict with the siblings,
> +	 * we still allow the cpumask change to proceed while
> +	 * invalidating the partition.
> +	 */
> +	if (compute_trialcs_excpus(trialcs, cs))
> +		trialcs->prs_err = PERR_NOTEXCL;
> +	else
> +		trialcs->prs_err = PERR_NONE;
>   
>   	retval = cpus_allowed_validate_change(cs, trialcs, &tmp);
>   	if (retval < 0)
> diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> index a17256d9f88a..75154e22c702 100755
> --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> @@ -269,7 +269,7 @@ TEST_MATRIX=(
>   	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X3:P2    .      .     0 A1:0-2|A2:3|A3:3 A1:P0|A2:P2 3"
>   	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3  X2-3:P2   .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
>   	" C0-3:S+ C1-3:S+ C2-3     .    X2-3   X2-3 X2-3:P2:C3 .     0 A1:0-1|A2:1|A3:2-3 A1:P0|A3:P2 2-3"
> -	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-3|A2:1-3|A3:2-3|B1:2-3 A1:P0|A3:P0|B1:P-2"
> +	" C0-3:S+ C1-3:S+ C2-3   C2-3     .      .      .      P2    0 A1:0-1|A2:1|A3:1|B1:2-3 A1:P0|A3:P0|B1:P2"
>   	" C0-3:S+ C1-3:S+ C2-3   C4-5     .      .      .      P2    0 B1:4-5 B1:P2 4-5"
>   	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3  X2-3:P2   P2    0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
>   	" C0-3:S+ C1-3:S+ C2-3    C4    X2-3   X2-3 X2-3:P2:C1-3 P2  0 A3:2-3|B1:4 A3:P2|B1:P2 2-4"
> @@ -318,7 +318,7 @@ TEST_MATRIX=(
>   	# Invalid to valid local partition direct transition tests
>   	" C1-3:S+:P2 X4:P2  .      .      .      .      .      .     0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
>   	" C1-3:S+:P2 X4:P2  .      .      .    X3:P2    .      .     0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
> -	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:4-6 A1:P-2|B1:P0"
> +	"  C0-3:P2   .      .    C4-6   C0-4     .      .      .     0 A1:0-4|B1:5-6 A1:P2|B1:P0"
>   	"  C0-3:P2   .      .    C4-6 C0-4:C0-3  .      .      .     0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
>   
>   	# Local partition invalidation tests
> @@ -388,10 +388,10 @@ TEST_MATRIX=(
>   	"  C0-1:S+  C1      .    C2-3     .      P2     .      .     0 A1:0-1|A2:1 A1:P0|A2:P-2"
>   	"  C0-1:S+ C1:P2    .    C2-3     P1     .      .      .     0 A1:0|A2:1 A1:P1|A2:P2 0-1|1"
>   
> -	# A non-exclusive cpuset.cpus change will invalidate partition and its siblings
> -	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P0"
> -	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P-1|B1:P-1"
> -	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-2|B1:2-3 A1:P0|B1:P-1"
> +	# A non-exclusive cpuset.cpus change will not invalidate its siblings partition.
> +	"  C0-1:P1   .      .    C2-3   C0-2     .      .      .     0 A1:0-2|B1:3 A1:P1|B1:P0"
> +	"  C0-1:P1   .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P-1|B1:P1"
> +	"   C0-1     .      .  P1:C2-3  C0-2     .      .      .     0 A1:0-1|B1:2-3 A1:P0|B1:P1"
>   
>   	# cpuset.cpus can overlap with sibling cpuset.cpus.exclusive but not subsumed by it
>   	"   C0-3     .      .    C4-5     X5     .      .      .     0 A1:0-3|B1:4-5"


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ