[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLm-2Lcnu602AB85@slm.duckdns.org>
Date: Thu, 4 Sep 2025 06:31:20 -1000
From: Tejun Heo <tj@...nel.org>
To: Yi Tao <escape@...ux.alibaba.com>
Cc: hannes@...xchg.org, mkoutny@...e.com, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] cgroup: replace global percpu_rwsem with
signal_struct->group_rwsem when writing cgroup.procs/threads
Hello,
On Thu, Sep 04, 2025 at 07:39:32PM +0800, Yi Tao wrote:
...
> To avoid affecting other users, the per-thread-group rwsem is only used
> when the `favordynmods` flag is enabled.
Can you please:
- Note that this isn't necessary for cgroup2's recommended workflow and is
thus gated behind favordynmods.
- Include performance numbers briefly.
> +extern bool have_favordynmods;
> +
> /**
> * cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups
> * @tsk: target task
> @@ -838,6 +840,8 @@ struct cgroup_of_peak {
> static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
> {
> percpu_down_read(&cgroup_threadgroup_rwsem);
> + if (have_favordynmods)
> + down_read(&tsk->signal->group_rwsem);
> }
>
> /**
> @@ -848,6 +852,8 @@ static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
> */
> static inline void cgroup_threadgroup_change_end(struct task_struct *tsk)
> {
> + if (have_favordynmods)
> + up_read(&tsk->signal->group_rwsem);
> percpu_up_read(&cgroup_threadgroup_rwsem);
Hmm... I wonder whether turning on/off the flag is racy. ie. what prevents
have_favordynmods flipping while a task is between change_begin and end?
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 1ef1edbaaf79..86fbc99a9174 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -226,6 +226,10 @@ struct signal_struct {
> struct tty_audit_buf *tty_audit_buf;
> #endif
>
> +#ifdef CONFIG_CGROUPS
> + struct rw_semaphore group_rwsem;
> +#endif
Maybe name it more specific - e.g. cgroup_threadgroup_rwsem?
> /**
> * cgroup_attach_lock - Lock for ->attach()
> - * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
> + * @tsk: thread group to lock
> + * @lock_threadgroup: whether to down_write rwsem
> *
> * cgroup migration sometimes needs to stabilize threadgroups against forks and
> * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
> @@ -2480,21 +2481,30 @@ EXPORT_SYMBOL_GPL(cgroup_path_ns);
> * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
> * CPU hotplug is disabled on entry.
> */
Please expand the function comment to explain what's going on and why and
maybe point to it from a comment on top of favor_dynmods.
> -void cgroup_attach_lock(bool lock_threadgroup)
> +void cgroup_attach_lock(struct task_struct *tsk, bool lock_threadgroup)
As @tsk is an optional argument, it'd probably make more sense to put it at
the end.
> @@ -3010,15 +3008,27 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
> */
> if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
> tsk = ERR_PTR(-EINVAL);
> - goto out_unlock_threadgroup;
> + goto out_unlock_rcu;
> }
> -
> get_task_struct(tsk);
> - goto out_unlock_rcu;
>
> -out_unlock_threadgroup:
> - cgroup_attach_unlock(*threadgroup_locked);
> - *threadgroup_locked = false;
> + rcu_read_unlock();
> +
> + /*
> + * If we migrate a single thread, we don't care about threadgroup
> + * stability. If the thread is `current`, it won't exit(2) under our
> + * hands or change PID through exec(2). We exclude
> + * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
> + * callers by cgroup_mutex.
> + * Therefore, we can skip the global lock.
> + */
> + lockdep_assert_held(&cgroup_mutex);
> + *threadgroup_locked = pid || threadgroup;
> +
> + cgroup_attach_lock(tsk, *threadgroup_locked);
I'm not sure this relocation is safe. What prevents e.g. @tsk changing its
group leader or signal struct before lock is grabbed?
Thanks.
--
tejun
Powered by blists - more mailing lists