lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45f5e2c6-42ec-4d77-9c2d-0e00472a05de@huaweicloud.com>
Date: Thu, 27 Nov 2025 09:55:21 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Waiman Long <llong@...hat.com>, Michal Koutný
 <mkoutny@...e.com>
Cc: Sun Shaojie <sunshaojie@...inos.cn>, cgroups@...r.kernel.org,
 hannes@...xchg.org, linux-kernel@...r.kernel.org,
 linux-kselftest@...r.kernel.org, shuah@...nel.org, tj@...nel.org
Subject: Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on
 cpuset.cpus conflict.



On 2025/11/27 3:43, Waiman Long wrote:
> On 11/26/25 9:13 AM, Michal Koutný wrote:
>> On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long <llong@...hat.com> wrote:
>>> In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu
>>> lists overlap, we can't have both of them as valid partition roots. So
>>> either one of A1 or B1 is valid or they are both invalid. The current code
>>> makes them both invalid no matter the operation ordering.  This patch will

I have to admit that I prefer the current implementation.

At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would
make it more difficult for users to understand why the cpuset.cpus they configured do not match the
effective CPUs in use, and why different operation orders yield different results.

In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member)
created under A1 will end up with empty effective CPUs—and this is not a desired behavior.

   root cgroup
        |
       A1
      /  \
    A2    A3...

 #1> echo "0-1" > A1/cpuset.cpus
 #2> echo "root" > A1/cpuset.cpus.partition
 #3> echo "0-1" > A2/cpuset.cpus
 #4> echo "root" > A2/cpuset.cpus.partition
 mkdir A4
 mkdir A5
 echo "0" > A4/cpuset.cpus
 echo $$ > A4/cgroup.procs
 echo "1" > A5/cpuset.cpus
 echo $$ > A5/cgroup.procs


[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its
requirement?" --Michal.

>>> make one of them valid given the operation ordering above. To minimize
>>> partition invalidation, we will have to live with the fact that it will be
>>> first-come first-serve as noted by Michal. I am not against this, we just
>>> have to document it. However, the following operation order will still make
>>> both of them invalid:
>> I'm skeptical of the FCFS behavior since I'm afraid it may be subject to
>> race conditions in practice.
>> BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior
>> in this regard?
> 
> Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more
> tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can
> actually vary depending on the actual serialization results of those operations.
> 
> One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on
> cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a
> valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The
> use of cpuset.cpus.exclusive is required for creating remote partition.
> 
> OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed
> and is limited to the creation of local partition only.
> 
> Does that answer your question?
> 
> Cheers,
> Longman
> 

-- 
Best regards,
Ridong


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ