[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94d62350-c449-47b3-a423-9fa5ff5b984c@huaweicloud.com>
Date: Mon, 12 Jan 2026 11:56:17 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Waiman Long <llong@...hat.com>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Koutný
<mkoutny@...e.com>, Jonathan Corbet <corbet@....net>,
Shuah Khan <shuah@...nel.org>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-doc@...r.kernel.org,
Sun Shaojie <sunshaojie@...inos.cn>
Subject: Re: [PATCH cgroup/for-6.20 v3 5/5] cgroup/cpuset: Move the v1 empty
cpus/mems check to cpuset1_validate_change()
On 2026/1/12 11:47, Waiman Long wrote:
> On 1/11/26 9:35 PM, Chen Ridong wrote:
>>
>> On 2026/1/12 10:29, Chen Ridong wrote:
>>>
>>> On 2026/1/10 9:32, Waiman Long wrote:
>>>> As stated in commit 1c09b195d37f ("cpuset: fix a regression in validating
>>>> config change"), it is not allowed to clear masks of a cpuset if
>>>> there're tasks in it. This is specific to v1 since empty "cpuset.cpus"
>>>> or "cpuset.mems" will cause the v2 cpuset to inherit the effective CPUs
>>>> or memory nodes from its parent. So it is OK to have empty cpus or mems
>>>> even if there are tasks in the cpuset.
>>>>
>>>> Move this empty cpus/mems check in validate_change() to
>>>> cpuset1_validate_change() to allow more flexibility in setting
>>>> cpus or mems in v2. cpuset_is_populated() needs to be moved into
>>>> cpuset-internal.h as it is needed by the empty cpus/mems checking code.
>>>>
>>>> Also add a test case to test_cpuset_prs.sh to verify that.
>>>>
>>>> Reported-by: Chen Ridong <chenridong@...weicloud.com>
>>>> Closes: https://lore.kernel.org/lkml/7a3ec392-2e86-4693-aa9f-1e668a668b9c@huaweicloud.com/
>>>> Signed-off-by: Waiman Long <longman@...hat.com>
>>>> ---
>>>> kernel/cgroup/cpuset-internal.h | 9 ++++++++
>>>> kernel/cgroup/cpuset-v1.c | 14 +++++++++++
>>>> kernel/cgroup/cpuset.c | 23 -------------------
>>>> .../selftests/cgroup/test_cpuset_prs.sh | 3 +++
>>>> 4 files changed, 26 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>>>> index e8e2683cb067..fd7d19842ded 100644
>>>> --- a/kernel/cgroup/cpuset-internal.h
>>>> +++ b/kernel/cgroup/cpuset-internal.h
>>>> @@ -260,6 +260,15 @@ static inline int nr_cpusets(void)
>>>> return static_key_count(&cpusets_enabled_key.key) + 1;
>>>> }
>>>> +static inline bool cpuset_is_populated(struct cpuset *cs)
>>>> +{
>>>> + lockdep_assert_cpuset_lock_held();
>>>> +
>>>> + /* Cpusets in the process of attaching should be considered as populated */
>>>> + return cgroup_is_populated(cs->css.cgroup) ||
>>>> + cs->attach_in_progress;
>>>> +}
>>>> +
>>>> /**
>>>> * cpuset_for_each_child - traverse online children of a cpuset
>>>> * @child_cs: loop cursor pointing to the current child
>>>> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>>>> index 04124c38a774..7a23b9e8778f 100644
>>>> --- a/kernel/cgroup/cpuset-v1.c
>>>> +++ b/kernel/cgroup/cpuset-v1.c
>>>> @@ -368,6 +368,20 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
>>>> if (par && !is_cpuset_subset(trial, par))
>>>> goto out;
>>>> + /*
>>>> + * Cpusets with tasks - existing or newly being attached - can't
>>>> + * be changed to have empty cpus_allowed or mems_allowed.
>>>> + */
>>>> + ret = -ENOSPC;
>>>> + if (cpuset_is_populated(cur)) {
>>>> + if (!cpumask_empty(cur->cpus_allowed) &&
>>>> + cpumask_empty(trial->cpus_allowed))
>>>> + goto out;
>>>> + if (!nodes_empty(cur->mems_allowed) &&
>>>> + nodes_empty(trial->mems_allowed))
>>>> + goto out;
>>>> + }
>>>> +
>>>> ret = 0;
>>>> out:
>>>> return ret;
>>> The current implementation is sufficient.
>>>
>>> However, I suggest we fully separate the validation logic for v1 and v2. While this may introduce
>>> some code duplication (likely minimal), it would allow us to modify the validate_change logic for v2
>>> in the future without needing to consider v1 compatibility. Given that v1 is unlikely to see further
>>> changes, this separation would be a practical long-term decision.
>>>
>>> @@ -368,6 +368,48 @@ int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial)
>>> if (par && !is_cpuset_subset(trial, par))
>>> goto out;
>>>
>>> + /*
>>> + * Cpusets with tasks - existing or newly being attached - can't
>>> + * be changed to have empty cpus_allowed or mems_allowed.
>>> + */
>>> + ret = -ENOSPC;
>>> + if (cpuset_is_populated(cur)) {
>>> + if (!cpumask_empty(cur->cpus_allowed) &&
>>> + cpumask_empty(trial->cpus_allowed))
>>> + goto out;
>>> + if (!nodes_empty(cur->mems_allowed) &&
>>> + nodes_empty(trial->mems_allowed))
>>> + goto out;
>>> + }
>>> +
>>> + /*
>>> + * We can't shrink if we won't have enough room for SCHED_DEADLINE
>>> + * tasks. This check is not done when scheduling is disabled as the
>>> + * users should know what they are doing.
>>> + *
>>> + * For v1, effective_cpus == cpus_allowed & user_xcpus() returns
>>> + * cpus_allowed.
>>> + *
>>> + */
>>> + ret = -EBUSY;
>>> + if (is_cpu_exclusive(cur) && is_sched_load_balance(cur) &&
>>> + !cpuset_cpumask_can_shrink(cur->effective_cpus, user_xcpus(trial)))
>>> + goto out;
>>> +
>>> + /*
>>> + * If either I or some sibling (!= me) is exclusive, we can't
>>> + * overlap. exclusive_cpus cannot overlap with each other if set.
>>> + */
>>> + ret = -EINVAL;
>>> + cpuset_for_each_child(c, css, par) {
>>> + if (c == cur)
>>> + continue;
>>> + if (cpuset1_cpus_excl_conflict(trial, c))
>>> + goto out;
>>> + if (mems_excl_conflict(trial, c))
>>> + goto out;
>>> + }
>>> +
>>> ret = 0;
>>> out:
>>> return ret;
>>>
>> A major redundancy is in the cpuset_cpumask_can_shrink check. By placing cpuset1_cpus_excl_conflict
>> within the v1 path, we could simplify the overall cpus_excl_conflict function as well.
>
> This is additional cleanup work. It can be done as a follow-on patch later on.
>
Okay, it looks good for me.
Since you are going to update patch 4, maybe you can just add a patch to clean up.
Reviewed-by: Chen Ridong <chenridong@...wei.com>
--
Best regards,
Ridong
Powered by blists - more mailing lists