[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251115060211.853449-1-sunshaojie@kylinos.cn>
Date: Sat, 15 Nov 2025 14:02:11 +0800
From: Sun Shaojie <sunshaojie@...inos.cn>
To: chenridong@...weicloud.com,
mkoutny@...e.com,
llong@...hat.com
Cc: cgroups@...r.kernel.org,
hannes@...xchg.org,
linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org,
shuah@...nel.org,
tj@...nel.org
Subject: Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
On 2015/11/15 08:58, Chen Ridong wrote:
>On 2025/11/15 0:14, Michal Koutný wrote:
>> On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong <chenridong@...weicloud.com> wrote:
>>> After further consideration, I still suggest retaining this rule.
>>
>> Apologies, I'm slightly lost which rule. I hope the new iteration from
>> Shaojie with both before/after tables will explain it.
>>
>
>The rule has changed in this patch from "If either cpuset is exclusive, check if they are mutually
>exclusive" to
>"If both cpusets are exclusive, check if they are mutually exclusive"
>
> - /* If either cpuset is exclusive, check if they are mutually exclusive */
> - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
> + /* If both cpusets are exclusive, check if they are mutually exclusive */
> + if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
> + return !cpusets_are_exclusive(cs1, cs2);
>
>I suggest not modifying this rule and keeping the original logic intact:
>
>>> For am example:
>>> Step | A1's prstate | B1's prstate |
>>> #1> mkdir -p A1 | member | |
>>> #2> echo "0-1" > A1/cpuset.cpus.exclusive | member | |
>>> #3> echo "root" > A1/cpuset.cpus.partition | root | |
>>> #4> mkdir -p B1 | root | member |
>>> #5> echo "0" > B1/cpuset.cpus | root invalid | member |
>>>
>>> Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
>>> invalidated?
>>
>> A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
>> anymore.
>>
>>> B1 could also use the parent's effective CPUs, right?
>>
>> Here you assume some ordering between siblings treating A1 more
>> important than B1. But it's symmetrical in principle, no?
>>
>
>I’m using an example to illustrate that if Shaojie’s patch is accepted, other rules could be relaxed
>following the same logic—but I’m not in favor of doing so.
Hi, Ridong,
Thank you for pointing out the issue with the current patch; this is indeed
not what our product intends. I must admit that I haven't thoroughly tested
on such recent kernel versions.
Obviously, this patch is flawed. However, patch v3 is needed. Regarding the
"other rules" you mentioned, we do not intend to relax them. On the
contrary, we aim to maintain them firmly.
Our product need ensure the following behavior: in cgroup-v2, user
modifications to one cpuset should not affect the partition state of its
sibling cpusets. This is justified and meaningful, as it aligns with the
isolation characteristics of cgroups.
This can be divided into two scenarios:
Scenario 1: Only one of A1 and B1 is "root".
Scenario 2: Both A1 and B1 are "root".
We plan to implement Scenario 1 first. This is the goal of patch v2.
However, patch v2 is flawed because it does not strictly adhere to the
following existing rule.
However, it is worth noting that the current cgroup v2 implementation does
not strictly adhere to the following rule either (which is also an
objective for patch v3 to address).
Rule 1: "cpuset.cpus" cannot be a subset of a sibling's "cpuset.cpus.exclusive".
Using your example to illustrate.
Step (refer to the steps in the table below)
#1> mkdir -p A1
#2> echo "0-1" > A1/cpuset.cpus.exclusive
#3> echo "root" > A1/cpuset.cpus.partition
#4> mkdir -p B1
#5> echo "0" > B1/cpuset.cpus
Table 1: Current result
Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
#1 | 0 | | | member | |
#2 | 0 | 0-1 | | member | |
#3 | 0 | 0-1 | | root | |
#4 | 0 | 0-1 | | root | member |
#5 | 0 | 0-1 | 0 | root invalid | member |
Table 2: Expected result
Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
#1 | 0 | | | member | |
#2 | 0 | 0-1 | | member | |
#3 | 0 | 0-1 | | root | |
#4 | 0 | 0-1 | | root | member |
#5 | error | 0-1 | | root | member |
Currently, after step #5, the operation returns success, which clearly
violates Rule 1, as B1's "cpuset.cpus" is a subset of A1's
"cpuset.cpus.exclusive".
Therefore, after step #5, the operation should return error, with A1
remaining as "root". This better complies with the Rule 1.
------
The following content is provided for reference, and we hope it may be
adopted in the future.
!!These are not part of what patch v3 will implement.
As for Scenario 2 (Both A1 and B1 are "root"), we will retain the current
cgroup v2 behavior. This patch series does not modify it, but we hope to
draw the maintainers' attention, as we indeed have plans for future
modifications. Our intent can be seen from the following examples.
For example:
Step (refer to the steps in the table below)
#1> mkdir -p A1
#2> echo "0-1" > A1/cpuset.cpus
#3> echo "root" > A1/cpuset.cpus.partition
#4> mkdir -p B1
#5> echo "2-3" > B1/cpuset.cpus
#6> echo "root" > B1/cpuset.cpus.partition
#7> echo "1-2" > B1/cpuset.cpus
Table 1: Current result
Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
#1 | from parent | | member | |
#2 | 0-1 | | member | |
#3 | 0-1 | | root | |
#4 | 0-1 | from parent | root | member |
#5 | 0-1 | 2-3 | root | member |
#6 | 0-1 | 2-3 | root | root |
#7 | 0-1 | 1-2 | root invalid | root invalid |
Table 2: Expected result
Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
#1 | from parent | | member | |
#2 | 0-1 | | member | |
#3 | 0-1 | | root | |
#4 | 0-1 | from parent | root | member |
#5 | 0-1 | 2-3 | root | member |
#6 | 0-1 | 2-3 | root | root |
#7 | 0-1 | 2 | root | root invalid |
After step #7, we expect A1 to remain "root" (unaffected), while only B1
becomes "root invalid".
The following Rule 2 and Rule 3 are alsomplemented and adhered to by our
product. The current cgroup v2 implementation does not enforce them.
Likewise, we hope this will draw the maintainers' attention. Maybe, they can
be applied in the future.
Rule 2: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.effective"
must either be a subset of it, or "cpuset.cpus.effective" is null.
Rule 3: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.exclusive"
must either be a subset of it, or "cpuset.cpus.exclusive" is null.
Rationale: "cpuset.cpus" represents the CPUs requested by the user, and the
system should honor the user's intention.
---
Thanks,
Sun Shaojie
Powered by blists - more mailing lists