[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9332ba1-2614-44c2-b2e8-eab213f196bc@huaweicloud.com>
Date: Sat, 15 Nov 2025 15:41:03 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Sun Shaojie <sunshaojie@...inos.cn>, mkoutny@...e.com, llong@...hat.com
Cc: cgroups@...r.kernel.org, hannes@...xchg.org,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
shuah@...nel.org, tj@...nel.org
Subject: Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
On 2025/11/15 14:02, Sun Shaojie wrote:
> On 2015/11/15 08:58, Chen Ridong wrote:
>> On 2025/11/15 0:14, Michal Koutný wrote:
>>> On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong <chenridong@...weicloud.com> wrote:
>>>> After further consideration, I still suggest retaining this rule.
>>>
>>> Apologies, I'm slightly lost which rule. I hope the new iteration from
>>> Shaojie with both before/after tables will explain it.
>>>
>>
>> The rule has changed in this patch from "If either cpuset is exclusive, check if they are mutually
>> exclusive" to
>> "If both cpusets are exclusive, check if they are mutually exclusive"
>>
>> - /* If either cpuset is exclusive, check if they are mutually exclusive */
>> - if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
>> + /* If both cpusets are exclusive, check if they are mutually exclusive */
>> + if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>> + return !cpusets_are_exclusive(cs1, cs2);
>>
>> I suggest not modifying this rule and keeping the original logic intact:
>>
>>>> For am example:
>>>> Step | A1's prstate | B1's prstate |
>>>> #1> mkdir -p A1 | member | |
>>>> #2> echo "0-1" > A1/cpuset.cpus.exclusive | member | |
>>>> #3> echo "root" > A1/cpuset.cpus.partition | root | |
>>>> #4> mkdir -p B1 | root | member |
>>>> #5> echo "0" > B1/cpuset.cpus | root invalid | member |
>>>>
>>>> Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
>>>> invalidated?
>>>
>>> A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
>>> anymore.
>>>
>>>> B1 could also use the parent's effective CPUs, right?
>>>
>>> Here you assume some ordering between siblings treating A1 more
>>> important than B1. But it's symmetrical in principle, no?
>>>
>>
>> I’m using an example to illustrate that if Shaojie’s patch is accepted, other rules could be relaxed
>> following the same logic—but I’m not in favor of doing so.
>
> Hi, Ridong,
>
> Thank you for pointing out the issue with the current patch; this is indeed
> not what our product intends. I must admit that I haven't thoroughly tested
> on such recent kernel versions.
>
> Obviously, this patch is flawed. However, patch v3 is needed. Regarding the
> "other rules" you mentioned, we do not intend to relax them. On the
> contrary, we aim to maintain them firmly.
>
> Our product need ensure the following behavior: in cgroup-v2, user
> modifications to one cpuset should not affect the partition state of its
> sibling cpusets. This is justified and meaningful, as it aligns with the
> isolation characteristics of cgroups.
>
This is ideal in theory, but I don’t think it’s practical in reality.
> This can be divided into two scenarios:
> Scenario 1: Only one of A1 and B1 is "root".
> Scenario 2: Both A1 and B1 are "root".
>
> We plan to implement Scenario 1 first. This is the goal of patch v2.
> However, patch v2 is flawed because it does not strictly adhere to the
> following existing rule.
>
> However, it is worth noting that the current cgroup v2 implementation does
> not strictly adhere to the following rule either (which is also an
> objective for patch v3 to address).
>
> Rule 1: "cpuset.cpus" cannot be a subset of a sibling's "cpuset.cpus.exclusive".
>
> Using your example to illustrate.
> Step (refer to the steps in the table below)
> #1> mkdir -p A1
> #2> echo "0-1" > A1/cpuset.cpus.exclusive
> #3> echo "root" > A1/cpuset.cpus.partition
> #4> mkdir -p B1
> #5> echo "0" > B1/cpuset.cpus
>
> Table 1: Current result
> Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
> #1 | 0 | | | member | |
> #2 | 0 | 0-1 | | member | |
> #3 | 0 | 0-1 | | root | |
> #4 | 0 | 0-1 | | root | member |
> #5 | 0 | 0-1 | 0 | root invalid | member |
>
I think this what we expect.
> Table 2: Expected result
> Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
> #1 | 0 | | | member | |
> #2 | 0 | 0-1 | | member | |
> #3 | 0 | 0-1 | | root | |
> #4 | 0 | 0-1 | | root | member |
> #5 | error | 0-1 | | root | member |
>
Step 5 should not return an error. As Longman pointed out, in cgroup-v2, setting cpuset.cpus should
never fail.
> Currently, after step #5, the operation returns success, which clearly
> violates Rule 1, as B1's "cpuset.cpus" is a subset of A1's
> "cpuset.cpus.exclusive".
>
> Therefore, after step #5, the operation should return error, with A1
> remaining as "root". This better complies with the Rule 1.
>
This is an exclusivity rule. Since it violates the exclusivity rules, A1 should be invalidated.
> ------
> The following content is provided for reference, and we hope it may be
> adopted in the future.
> !!These are not part of what patch v3 will implement.
>
> As for Scenario 2 (Both A1 and B1 are "root"), we will retain the current
> cgroup v2 behavior. This patch series does not modify it, but we hope to
> draw the maintainers' attention, as we indeed have plans for future
> modifications. Our intent can be seen from the following examples.
>
> For example:
> Step (refer to the steps in the table below)
> #1> mkdir -p A1
> #2> echo "0-1" > A1/cpuset.cpus
> #3> echo "root" > A1/cpuset.cpus.partition
> #4> mkdir -p B1
> #5> echo "2-3" > B1/cpuset.cpus
> #6> echo "root" > B1/cpuset.cpus.partition
> #7> echo "1-2" > B1/cpuset.cpus
>
> Table 1: Current result
> Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
> #1 | from parent | | member | |
> #2 | 0-1 | | member | |
> #3 | 0-1 | | root | |
> #4 | 0-1 | from parent | root | member |
> #5 | 0-1 | 2-3 | root | member |
> #6 | 0-1 | 2-3 | root | root |
> #7 | 0-1 | 1-2 | root invalid | root invalid |
>
> Table 2: Expected result
> Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
> #1 | from parent | | member | |
> #2 | 0-1 | | member | |
> #3 | 0-1 | | root | |
> #4 | 0-1 | from parent | root | member |
> #5 | 0-1 | 2-3 | root | member |
> #6 | 0-1 | 2-3 | root | root |
> #7 | 0-1 | 2 | root | root invalid |
>
> After step #7, we expect A1 to remain "root" (unaffected), while only B1
> becomes "root invalid".
>
With the result you expect, would we observe the following behaviors:
#1> mkdir -p A1
#2> mkdir -p B1
#3> echo "0-1" > A1/cpuset.cpus
#4> echo "1-2" > B1/cpuset.cpus
#5> echo "root" > A1/cpuset.cpus.partition
#6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid
#1> mkdir -p A1
#2> mkdir -p B1
#3> echo "0-1" > A1/cpuset.cpus
#4> echo "1-2" > B1/cpuset.cpus
#5> echo "root" > B1/cpuset.cpus.partition
#6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root
Do different operation orders yield different results? If so, this is not what we expect.
--
Best regards,
Ridong
Powered by blists - more mailing lists