[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28004f86-72a4-44eb-aa0a-0c9c0a1d6671@linux.alibaba.com>
Date: Thu, 4 Sep 2025 14:38:56 +0800
From: escape <escape@...ux.alibaba.com>
To: Tejun Heo <tj@...nel.org>
Cc: hannes@...xchg.org, mkoutny@...e.com, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cgroup: replace global percpu_rwsem with
signal_struct->group_rwsem when writing cgroup.procs/threads
在 2025/9/4 11:15, escape 写道:
>
> 在 2025/9/4 00:53, Tejun Heo 写道:
>> Hello,
>>
>> On Wed, Sep 03, 2025 at 07:11:07PM +0800, Yi Tao wrote:
>>> As computer hardware advances, modern systems are typically equipped
>>> with many CPU cores and large amounts of memory, enabling the
>>> deployment
>>> of numerous applications. On such systems, container creation and
>>> deletion become frequent operations, making cgroup process migration no
>>> longer a cold path. This leads to noticeable contention with common
>>> process operations such as fork, exec, and exit.
>> If you use CLONE_INTO_CGROUP, cgroup migration doesn't just become
>> cold. It
>> disappears completely and CLONE_INTO_CGROUP doesn't need any global
>> locks
>> from cgroup side. Are there reasons why you can't use CLONE_INTO_CGROUP?
>>
>> Thanks.
>>
> As Ridong pointed out, in the current code, using CLONE_INTO_CGROUP
> still requires holding the threadgroup_rwsem, so contention with fork
> operations persists.
Sorry, my understanding here was wrong; using CLONE_INTO_CGROUP can
indeed avoid the race condition with fork, but the restrictions do exist.
Thanks.
>
> CLONE_INTO_CGROUP helps alleviate the contention between cgroup creation
> and deletion, but its usage comes with significant limitations:
>
> 1. CLONE_INTO_CGROUP is only available in cgroup v2. Although cgroup v2
> adoption is gradually increasing, many applications have not yet been
> adapted to cgroup v2, and phasing out cgroup v1 will be a long and
> gradual process.
>
>
> 2. CLONE_INTO_CGROUP requires specifying the cgroup file descriptor at
> the
> time of process fork, effectively restricting cgroup migration to the
> fork stage. This differs significantly from the typical cgroup attach
> workflow. For example, in Kubernetes, systemd is the recommended cgroup
> driver; kubelet communicates with systemd via D-Bus, and systemd
> performs the actual cgroup attachment. In this case, the process being
> attached typically does not have systemd as its parent. Using
> CLONE_INTO_CGROUP in such a scenario is impractical and would require
> coordinated changes to both systemd and kubelet.
>
> Thanks.
>
Powered by blists - more mailing lists