linux-kernel - Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <38ab0503-3176-43a0-b6b5-09de0fd9eb75@huaweicloud.com>
Date: Mon, 5 Jan 2026 15:00:44 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Waiman Long <llong@...hat.com>, Tejun Heo <tj@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Michal Koutný
 <mkoutny@...e.com>, Jonathan Corbet <corbet@....net>,
 Shuah Khan <shuah@...nel.org>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
 linux-kselftest@...r.kernel.org, linux-doc@...r.kernel.org,
 Sun Shaojie <sunshaojie@...inos.cn>
Subject: Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail
 cpuset.cpus change in v2



On 2026/1/5 11:59, Waiman Long wrote:
> On 1/4/26 8:35 PM, Chen Ridong wrote:
>>
>> On 2026/1/5 5:48, Waiman Long wrote:
>>> On 1/4/26 2:09 AM, Chen Ridong wrote:
>>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>>>>> until valid partition") introduced a new check to disallow the setting
>>>>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>>>>> cpuset.cpus value so that there will at least be one CPU left in the
>>>>> sibling in case the cpuset becomes a valid partition root. This new
>>>>> check does have the side effect of failing a cpuset.cpus change that
>>>>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>>>>
>>>>> With v2, users are supposed to be allowed to set whatever value they
>>>>> want in cpuset.cpus without failure. To maintain this rule, the check
>>>>> is now restricted to only when cpuset.cpus.exclusive is being changed
>>>>> not when cpuset.cpus is changed.
>>>>>
>>>> Hi, Longman,
>>>>
>>>> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
>>>> explicitly documented. Should we add it?
>>>>
>>>> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
>>>> constraints? This seems to be the underlying assumption in this patch.
>>> Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list
>>> is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that
>>> if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the
>>> documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking
>>> existing applications.
>>>
>> There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left
>> on device) and EBUSY.
>>
>> I just want to ensure the behavior aligns with our design intent.
>>
>> Consider this example:
>>
>> # cd /sys/fs/cgroup/
>> # mkdir test
>> # echo 1 > test/cpuset.cpus
>> # echo $$ > test/cgroup.procs
>> # echo 0 > /sys/devices/system/cpu/cpu1/online
>> # echo > test/cpuset.cpus
>> -bash: echo: write error: No space left on device
>>
>> In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My
>> question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is
>> populated?
> 
> Good catch. This error is for v1. It shouldn't apply for v2. Yes, I think we should fix that for v2.
> 

The EBUSY check (through cpuset_cpumask_can_shrink) is necessary, correct?

Since the subsequent patch modifies exclusive checking for v1, should we consolidate all v1-related
code into a separate function like cpuset1_validate_change() (maybe come duplicate code)?, it would
allow us to isolate v1 logic and avoid having to account for v1 implementation details in future
features.

In other words:

validate_change(...)
{
    if (!is_in_v2_mode())
        return cpuset1_validate_change(cur, trial);
    ...
    // only v2 code here
}

-- 
Best regards,
Ridong