[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45f5e2c6-42ec-4d77-9c2d-0e00472a05de@huaweicloud.com>
Date: Thu, 27 Nov 2025 09:55:21 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Waiman Long <llong@...hat.com>, Michal Koutný
<mkoutny@...e.com>
Cc: Sun Shaojie <sunshaojie@...inos.cn>, cgroups@...r.kernel.org,
hannes@...xchg.org, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, shuah@...nel.org, tj@...nel.org
Subject: Re: [PATCH v5] cpuset: Avoid invalidating sibling partitions on
cpuset.cpus conflict.
On 2025/11/27 3:43, Waiman Long wrote:
> On 11/26/25 9:13 AM, Michal Koutný wrote:
>> On Mon, Nov 24, 2025 at 05:30:47PM -0500, Waiman Long <llong@...hat.com> wrote:
>>> In the example above, the final configuration is A1:0-1 & B1:1-2. As the cpu
>>> lists overlap, we can't have both of them as valid partition roots. So
>>> either one of A1 or B1 is valid or they are both invalid. The current code
>>> makes them both invalid no matter the operation ordering. This patch will
I have to admit that I prefer the current implementation.
At the very least, it ensures that all partitions are treated fairly[1]. Relaxing this rule would
make it more difficult for users to understand why the cpuset.cpus they configured do not match the
effective CPUs in use, and why different operation orders yield different results.
In another scenario, if we do not invalidate the siblings, new leaf cpusets (marked as member)
created under A1 will end up with empty effective CPUs—and this is not a desired behavior.
root cgroup
|
A1
/ \
A2 A3...
#1> echo "0-1" > A1/cpuset.cpus
#2> echo "root" > A1/cpuset.cpus.partition
#3> echo "0-1" > A2/cpuset.cpus
#4> echo "root" > A2/cpuset.cpus.partition
mkdir A4
mkdir A5
echo "0" > A4/cpuset.cpus
echo $$ > A4/cgroup.procs
echo "1" > A5/cpuset.cpus
echo $$ > A5/cgroup.procs
[1]: "B1 is a second-class partition only because it starts later or why is it OK to not fulfill its
requirement?" --Michal.
>>> make one of them valid given the operation ordering above. To minimize
>>> partition invalidation, we will have to live with the fact that it will be
>>> first-come first-serve as noted by Michal. I am not against this, we just
>>> have to document it. However, the following operation order will still make
>>> both of them invalid:
>> I'm skeptical of the FCFS behavior since I'm afraid it may be subject to
>> race conditions in practice.
>> BTW should cpuset.cpus and cpuset.cpus.exclusive have different behavior
>> in this regard?
>
> Modification to cpumasks are all serialized by the cpuset_mutex. If you are referring to 2 or more
> tasks doing parallel updates to various cpuset control files of sibling cpusets, the results can
> actually vary depending on the actual serialization results of those operations.
>
> One difference between cpuset.cpus and cpuset.cpus.exclusive is the fact that operations on
> cpuset.cpus.exclusive can fail if the result is not exclusive WRT sibling cpusets, but becoming a
> valid partition is guaranteed unless none of the exclusive CPUs are passed down from the parent. The
> use of cpuset.cpus.exclusive is required for creating remote partition.
>
> OTOH, changes to cpuset.cpus will never fail, but becoming a valid partition root is not guaranteed
> and is limited to the creation of local partition only.
>
> Does that answer your question?
>
> Cheers,
> Longman
>
--
Best regards,
Ridong
Powered by blists - more mailing lists