[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLk3Fftch9lUMJTv@slm.duckdns.org>
Date: Wed, 3 Sep 2025 20:52:05 -1000
From: Tejun Heo <tj@...nel.org>
To: Chen Ridong <chenridong@...weicloud.com>
Cc: Michal Koutný <mkoutny@...e.com>,
Yi Tao <escape@...ux.alibaba.com>, hannes@...xchg.org,
cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cgroup: replace global percpu_rwsem with
signal_struct->group_rwsem when writing cgroup.procs/threads
Hello,
On Thu, Sep 04, 2025 at 09:40:12AM +0800, Chen Ridong wrote:
...
> > Sorry, I was confused. We no longer need to write lock threadgroup rwsem
> > when CLONE_INTO_CGROUP'ing into an empty cgroup. We do still need
> > cgroup_mutex.
> >
> > 671c11f0619e ("cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree")
> >
> > Thanks.
> >
>
> I'm still a bit confused. Commit 671c11f0619e ("cgroup: Elide write-locking threadgroup_rwsem when
> updating csses on an empty subtree") only applies to CSS updates. However, cloning with
> CLONE_INTO_CGROUP still requires acquiring the threadgroup_rwsem.
>
> cgroup_can_fork
> cgroup_css_set_fork
> if (kargs->flags & CLONE_INTO_CGROUP)
> cgroup_lock();
> cgroup_threadgroup_change_begin(current);
Ah, yeah, I'm misremembering things, sorry. What got elided in that commit
is down_write of threadgroup_rwsem when enabling controllers on empty
cgroups, which was the only operation which still needed to down_write the
rwsem. Here's an excerpt from the commit message:
After this optimization, the usage pattern of creating a cgroup, enabling
the necessary controllers, and then seeding it with CLONE_INTO_CGROUP and
then removing the cgroup after it becomes empty doesn't need to write-lock
threadgroup_rwsem at all.
It's true that cgroup_threadgroup_change_begin() down_reads the
threadgroup_rwsem but that is a percpu_rwsem whose read operations are
percpu inc/dec. This doesn't add any noticeable overhead or has any
scalability concerns.
So, if you follow the "recommended" workflow, the only remaining possible
scalability bottleneck is cgroup_mutex.
Thanks.
--
tejun
Powered by blists - more mailing lists