lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6e2e4f6a-a6a0-47b1-b037-c52f0c786018@huaweicloud.com>
Date: Wed, 12 Nov 2025 12:07:03 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Waiman Long <llong@...hat.com>, Michal Koutný
 <mkoutny@...e.com>
Cc: tj@...nel.org, hannes@...xchg.org, cgroups@...r.kernel.org,
 linux-kernel@...r.kernel.org, lujialin4@...wei.com, chenridong@...wei.com
Subject: Re: [PATCH next] cpuset: Treat tasks in attaching process as
 populated



On 2025/11/12 10:21, Waiman Long wrote:
> On 11/11/25 8:58 PM, Chen Ridong wrote:
>>
>> On 2025/11/12 4:35, Waiman Long wrote:
>>> On 11/11/25 2:25 PM, Michal Koutný wrote:
>>>> On Tue, Nov 11, 2025 at 10:16:33AM -0500, Waiman Long <llong@...hat.com> wrote:
>>>>> For internal helper like this one, we may not really need that as
>>>>> almost all the code in cpuset.c are within either a cpuset_mutex or
>>>>> callback_lock critical sections. So I am fine with or without it.
>>>> OK, cpuset_mutex and callback_lock are close but cgroup_is_populated()
>>>> that caught my eye would also need cgroup_mutex otherwise "the result
>>>> can only be used as a hint" (quote from cgroup.h).
>>>>
>>>> Or is it safe to assume that cpuset_mutex inside cpuset_attach() is
>>>> sufficient to always (incl. exits) ensure stability of
>>>> cgroup_is_populated() result?
>>>>
>>>> Anyway, I'd find some clarifications in the commit message or the
>>>> surrounding code about this helpful. (Judgment call, whether with a
>>>> lockdep macro. My opinion is -- why not.)
>>> For attach_in_progress, it is protected by the cpuset_mutex. So it may make sense to add a
>>> lockdep_assert_held() for that.
>>>
>> Will add.
>>
>>> You are right that there are problems WRT the stability of cgroup_is_populated() value.
>>>
>>> I think "cgrp->nr_populated_csets + cs->attach_in_progress" should be almost stable for the cgroup
>>> itself with cpuset_mutex, but there can be a small timing window after cpuset_attach(), but before
>>> the stat is updated where the sum is 0, but there are actually tasks in the cgroup.
>>>
>> Do you mean there’s a small window after ss->attach (i.e., cpuset_attach) where
>> cgrp->nr_populated_csets + cs->attach_in_progress could be 0?
>>
>> If I understand correctly:
>>
>> ss->can_attach: cs->attach_in_progress++, sum > 0
>> css_set_move_task->css_set_update_populated: cgrp->nr_populated_csets++, sum > 0
>> ss->attach->cpuset_attach: cs->attach_in_progress--, sum > 0
>>
>> What exactly is the small window you’re referring to where the sum equals 0?
> Yes, the nr_populated_csets is incremented before calling cpuset_attach(). I think I had mixed up
> the ordering. Thanks for the clarification.
>>
>>> For "cgrp->nr_populated_domain_children + cgrp->nr_populated_threaded_children", it also has the
>>> problem that the sum can be 0 but there are attach_in_progress set in one or more of the child
>>> cgroups. So even with this patch, we can't guarantee 100% that there can be no task in the partition
>>> even if it has empty effective_cpus. It is only a problem for nested local partitions though as
>>> remote partitions are not allowed to exhaust all the CPUs from root cgroup.
>>>
>>> We should probably document that limitation to warn users if they try to create nested local
>>> partitions where the parent partition root of the child partitions has empty effective_cpus.
>>>
>> Hmm, but it was what the commit e2d59900d936 ("cgroup/cpuset: Allow no-task partition to have empty
>> cpuset.cpus.effective") allowed, and it makes sense.
> 
> This commit will allow user to not waste a CPU if users want to set up a nested local partitions
> where there is no task in some intermediate levels. One way to address this gap is to iterate down
> the cpuset hierarchy to check if any of its attach_in_progress count is set. So it is a solvable
> problem.
> 

Yes, the modification needed is to replace cpuset_for_each_child with cpuset_for_each_descendant_pre
in partition_is_populated.

> There is also a flip side where cgroup_is_populated() returns a non-zero value, but the tasks are
> actively being moved away. This false positive is less a problem even though it can cause a failure
> in distributing out all the CPUs to child partitions. The users can retry and it should work the
> second time.
> 

Agree. This is a short-lived state, and it’s difficult to determine if any tasks are in the
partition during this period. Users should retry.

-- 
Best regards,
Ridong


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ