linux-kernel - Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2ca99986-b15b-45bc-b2ee-23d9e5395691@huaweicloud.com>
Date: Fri, 14 Nov 2025 09:29:20 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Michal Koutný <mkoutny@...e.com>,
 Sun Shaojie <sunshaojie@...inos.cn>
Cc: llong@...hat.com, cgroups@...r.kernel.org, hannes@...xchg.org,
 linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
 shuah@...nel.org, tj@...nel.org
Subject: Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2



On 2025/11/14 1:07, Michal Koutný wrote:
> Hello.
> 
> On Thu, Nov 13, 2025 at 09:14:34PM +0800, Sun Shaojie <sunshaojie@...inos.cn> wrote:
>> In cgroup v2, a mutual overlap check is required when at least one of two
>> cpusets is exclusive. However, this check should be relaxed and limited to
>> cases where both cpusets are exclusive.
>>
>> The table 1 shows the partition states of A1 and B1 after each step before
>> applying this patch.
>>
>> Table 1: Before applying the patch
>>  Step                                       | A1's prstate | B1's prstate |
>>  #1> mkdir -p A1                            | member       |              |
>>  #2> echo "0-1" > A1/cpuset.cpus            | member       |              |
>>  #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>>  #4> mkdir -p B1                            | root         | member       |
>>  #5> echo "0-3" > B1/cpuset.cpus            | root invalid | member       |
>>  #6> echo "root" > B1/cpuset.cpus.partition | root invalid | root invalid |
>>
>> After step #5, A1 changes from "root" to "root invalid" because its CPUs
>> (0-1) overlap with those requested by B1 (0-3). However, B1 can actually
>> use CPUs 2-3, so it would be more reasonable for A1 to remain as "root."
> 
> I remember there was the addition of cgroup_file_notify() for the
> cpuset.cpus.partition so that such changes can be watched for.
> 

This behavior is visible to user space, I think.

After further consideration, I still suggest retaining this rule.

If we relax this rule, the following checks should also be relaxed?

	/* The cpus_allowed of one cpuset cannot be a subset of another cpuset's exclusive_cpus */
	if (!cpumask_empty(cs1->cpus_allowed) &&
	    cpumask_subset(cs1->cpus_allowed, cs2->exclusive_cpus))
		return true;

	if (!cpumask_empty(cs2->cpus_allowed) &&
	    cpumask_subset(cs2->cpus_allowed, cs1->exclusive_cpus))
		return true;


For am example:
  Step                                       | A1's prstate | B1's prstate |
  #1> mkdir -p A1                            | member       |              |
  #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
  #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
  #4> mkdir -p B1                            | root         | member       |
  #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |

Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
invalidated? B1 could also use the parent's effective CPUs, right?

This raises the question: Should we relax the restriction to allow a cpuset's cpus to be a subset of
its siblings' exclusive_cpus, thereby keeping A1 valid? If we do this, users may struggle to
understand what their cpuset.cpus.effective value is (and why it has that value)—contrary to their
expectations.

> I may not be seeing whole picture, so I ask -- why would it be "more
> reasonable" for A1 to remain root. From this description it looks like
> you'd silently convert B1's effective cpus to 2-3 but IIUC the code
> change that won't happen but you'd reject the write of "0-3" instead.
> 
> Isn't here missing Table 2: After applying the patch? I'm asking because
> of the number 1 but also because it'd make the intention clearer
> ;-), perhaps with a column for cpuset.cpus.effective.
> 
> Thanks,
> Michal

-- 
Best regards,
Ridong