linux-kernel - Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail cpuset.cpus change in v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <828377cf-4a64-48b4-887e-8f71ebed502c@redhat.com>
Date: Thu, 8 Jan 2026 23:14:28 -0500
From: Waiman Long <llong@...hat.com>
To: Chen Ridong <chenridong@...weicloud.com>, Waiman Long <llong@...hat.com>,
 Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
 Michal Koutný <mkoutny@...e.com>,
 Jonathan Corbet <corbet@....net>, Shuah Khan <shuah@...nel.org>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
 linux-kselftest@...r.kernel.org, linux-doc@...r.kernel.org,
 Sun Shaojie <sunshaojie@...inos.cn>
Subject: Re: [cgroup/for-6.20 PATCH v2 3/4] cgroup/cpuset: Don't fail
 cpuset.cpus change in v2

On 1/5/26 2:00 AM, Chen Ridong wrote:
>
> On 2026/1/5 11:59, Waiman Long wrote:
>> On 1/4/26 8:35 PM, Chen Ridong wrote:
>>> On 2026/1/5 5:48, Waiman Long wrote:
>>>> On 1/4/26 2:09 AM, Chen Ridong wrote:
>>>>> On 2026/1/2 3:15, Waiman Long wrote:
>>>>>> Commit fe8cd2736e75 ("cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE
>>>>>> until valid partition") introduced a new check to disallow the setting
>>>>>> of a new cpuset.cpus.exclusive value that is a superset of a sibling's
>>>>>> cpuset.cpus value so that there will at least be one CPU left in the
>>>>>> sibling in case the cpuset becomes a valid partition root. This new
>>>>>> check does have the side effect of failing a cpuset.cpus change that
>>>>>> make it a subset of a sibling's cpuset.cpus.exclusive value.
>>>>>>
>>>>>> With v2, users are supposed to be allowed to set whatever value they
>>>>>> want in cpuset.cpus without failure. To maintain this rule, the check
>>>>>> is now restricted to only when cpuset.cpus.exclusive is being changed
>>>>>> not when cpuset.cpus is changed.
>>>>>>
>>>>> Hi, Longman,
>>>>>
>>>>> You've emphasized that modifying cpuset.cpus should never fail. While I haven't found this
>>>>> explicitly documented. Should we add it?
>>>>>
>>>>> More importantly, does this mean the "never fail" rule has higher priority than the exclusive CPU
>>>>> constraints? This seems to be the underlying assumption in this patch.
>>>> Before the introduction of cpuset partition, writing to cpuset.cpus will only fail if the cpu list
>>>> is invalid like containing CPUs outside of the valid cpu range. What I mean by "never-fail" is that
>>>> if the cpu list is valid, the write action should not fail. The rule is not explicitly stated in the
>>>> documentation, but it is a pre-existing behavior which we should try to keep to avoid breaking
>>>> existing applications.
>>>>
>>> There are two conditions that can cause a cpuset.cpus write operation to fail: ENOSPC (No space left
>>> on device) and EBUSY.
>>>
>>> I just want to ensure the behavior aligns with our design intent.
>>>
>>> Consider this example:
>>>
>>> # cd /sys/fs/cgroup/
>>> # mkdir test
>>> # echo 1 > test/cpuset.cpus
>>> # echo $$ > test/cgroup.procs
>>> # echo 0 > /sys/devices/system/cpu/cpu1/online
>>> # echo > test/cpuset.cpus
>>> -bash: echo: write error: No space left on device
>>>
>>> In cgroups v2, if the test cgroup becomes empty, it could inherit the parent's effective CPUs. My
>>> question is: Should we still fail to clear cpuset.cpus (returning an error) when the cgroup is
>>> populated?
>> Good catch. This error is for v1. It shouldn't apply for v2. Yes, I think we should fix that for v2.
>>
> The EBUSY check (through cpuset_cpumask_can_shrink) is necessary, correct?

Yes, it is a check needed by the deadline scheduler irrespective of if 
v1 or v2 is used.


>
> Since the subsequent patch modifies exclusive checking for v1, should we consolidate all v1-related
> code into a separate function like cpuset1_validate_change() (maybe come duplicate code)?, it would
> allow us to isolate v1 logic and avoid having to account for v1 implementation details in future
> features.
>
> In other words:
>
> validate_change(...)
> {
>      if (!is_in_v2_mode())
>          return cpuset1_validate_change(cur, trial);
>      ...
>      // only v2 code here
> }
>
Yes, we could move the code to cpuset1_validate_change().

Cheers,
Longman

cpuset1_validate_change