lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251115060211.853449-1-sunshaojie@kylinos.cn>
Date: Sat, 15 Nov 2025 14:02:11 +0800
From: Sun Shaojie <sunshaojie@...inos.cn>
To: chenridong@...weicloud.com,
	mkoutny@...e.com,
	llong@...hat.com
Cc: cgroups@...r.kernel.org,
	hannes@...xchg.org,
	linux-kernel@...r.kernel.org,
	linux-kselftest@...r.kernel.org,
	shuah@...nel.org,
	tj@...nel.org
Subject: Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2

On 2015/11/15 08:58, Chen Ridong wrote:
>On 2025/11/15 0:14, Michal Koutný wrote:
>> On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong <chenridong@...weicloud.com> wrote:
>>> After further consideration, I still suggest retaining this rule.
>> 
>> Apologies, I'm slightly lost which rule. I hope the new iteration from
>> Shaojie with both before/after tables will explain it.
>> 
>
>The rule has changed in this patch from "If either cpuset is exclusive, check if they are mutually
>exclusive" to
>"If both cpusets are exclusive, check if they are mutually exclusive"
>
>  -    /* If either cpuset is exclusive, check if they are mutually exclusive */
>  -    if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
>  +    /* If both cpusets are exclusive, check if they are mutually exclusive */
>  +    if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>  +        return !cpusets_are_exclusive(cs1, cs2);
>
>I suggest not modifying this rule and keeping the original logic intact:
>
>>> For am example:
>>>   Step                                       | A1's prstate | B1's prstate |
>>>   #1> mkdir -p A1                            | member       |              |
>>>   #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
>>>   #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>>>   #4> mkdir -p B1                            | root         | member       |
>>>   #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>>
>>> Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
>>> invalidated?
>> 
>> A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
>> anymore.
>> 
>>> B1 could also use the parent's effective CPUs, right?
>> 
>> Here you assume some ordering between siblings treating A1 more
>> important than B1. But it's symmetrical in principle, no?
>> 
>
>I’m using an example to illustrate that if Shaojie’s patch is accepted, other rules could be relaxed
>following the same logic—but I’m not in favor of doing so.

Hi, Ridong,

Thank you for pointing out the issue with the current patch; this is indeed
not what our product intends. I must admit that I haven't thoroughly tested
on such recent kernel versions.

Obviously, this patch is flawed. However, patch v3 is needed. Regarding the
"other rules" you mentioned, we do not intend to relax them. On the 
contrary, we aim to maintain them firmly.

Our product need ensure the following behavior: in cgroup-v2, user 
modifications to one cpuset should not affect the partition state of its 
sibling cpusets. This is justified and meaningful, as it aligns with the 
isolation characteristics of cgroups.

This can be divided into two scenarios:
Scenario 1: Only one of A1 and B1 is "root".
Scenario 2: Both A1 and B1 are "root".

We plan to implement Scenario 1 first. This is the goal of patch v2.
However, patch v2 is flawed because it does not strictly adhere to the 
following existing rule.

However, it is worth noting that the current cgroup v2 implementation does 
not strictly adhere to the following rule either (which is also an 
objective for patch v3 to address).

Rule 1: "cpuset.cpus" cannot be a subset of a sibling's "cpuset.cpus.exclusive".

Using your example to illustrate.
 Step (refer to the steps in the table below)
 #1> mkdir -p A1                           
 #2> echo "0-1" > A1/cpuset.cpus.exclusive 
 #3> echo "root" > A1/cpuset.cpus.partition
 #4> mkdir -p B1               
 #5> echo "0" > B1/cpuset.cpus 

Table 1: Current result
 Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
 #1   | 0      |                |           | member       |              |
 #2   | 0      | 0-1            |           | member       |              |
 #3   | 0      | 0-1            |           | root         |              |
 #4   | 0      | 0-1            |           | root         | member       |
 #5   | 0      | 0-1            | 0         | root invalid | member       |

Table 2: Expected result
 Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
 #1   | 0      |                |           | member       |              |
 #2   | 0      | 0-1            |           | member       |              |
 #3   | 0      | 0-1            |           | root         |              |
 #4   | 0      | 0-1            |           | root         | member       |
 #5   | error  | 0-1            |           | root         | member       |

Currently, after step #5, the operation returns success, which clearly 
violates Rule 1, as B1's "cpuset.cpus" is a subset of A1's 
"cpuset.cpus.exclusive".

Therefore, after step #5, the operation should return error, with A1 
remaining as "root". This better complies with the Rule 1.

------
The following content is provided for reference, and we hope it may be 
adopted in the future.
!!These are not part of what patch v3 will implement.

As for Scenario 2 (Both A1 and B1 are "root"), we will retain the current 
cgroup v2 behavior. This patch series does not modify it, but we hope to 
draw the maintainers' attention, as we indeed have plans for future 
modifications. Our intent can be seen from the following examples.

For example:
 Step (refer to the steps in the table below)
 #1> mkdir -p A1                           
 #2> echo "0-1"  > A1/cpuset.cpus 
 #3> echo "root" > A1/cpuset.cpus.partition
 #4> mkdir -p B1               
 #5> echo "2-3"  > B1/cpuset.cpus 
 #6> echo "root" > B1/cpuset.cpus.partition
 #7> echo "1-2"  > B1/cpuset.cpus

Table 1: Current result
 Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
 #1   | from parent   |               | member       |              |
 #2   | 0-1           |               | member       |              |
 #3   | 0-1           |               | root         |              |
 #4   | 0-1           | from parent   | root         | member       |
 #5   | 0-1           | 2-3           | root         | member       |
 #6   | 0-1           | 2-3           | root         | root         |
 #7   | 0-1           | 1-2           | root invalid | root invalid |

Table 2: Expected result
 Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
 #1   | from parent   |               | member       |              |
 #2   | 0-1           |               | member       |              |
 #3   | 0-1           |               | root         |              |
 #4   | 0-1           | from parent   | root         | member       |
 #5   | 0-1           | 2-3           | root         | member       |
 #6   | 0-1           | 2-3           | root         | root         |
 #7   | 0-1           | 2             | root         | root invalid |

After step #7, we expect A1 to remain "root" (unaffected), while only B1 
becomes "root invalid".

 
The following Rule 2 and Rule 3 are alsomplemented and adhered to by our 
product. The current cgroup v2 implementation does not enforce them. 
Likewise, we hope this will draw the maintainers' attention. Maybe, they can
be applied in the future.

Rule 2: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.effective"
        must either be a subset of it, or "cpuset.cpus.effective" is null.

Rule 3: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.exclusive"
        must either be a subset of it, or "cpuset.cpus.exclusive" is null.

Rationale: "cpuset.cpus" represents the CPUs requested by the user, and the
        system should honor the user's intention.

---
Thanks,
Sun Shaojie



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ