linux-kernel - Re: [PATCH v2 1/1] cgroup: replace global percpu_rwsem with signal_struct->group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aLm-2Lcnu602AB85@slm.duckdns.org>
Date: Thu, 4 Sep 2025 06:31:20 -1000
From: Tejun Heo <tj@...nel.org>
To: Yi Tao <escape@...ux.alibaba.com>
Cc: hannes@...xchg.org, mkoutny@...e.com, cgroups@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] cgroup: replace global percpu_rwsem with
 signal_struct->group_rwsem when writing cgroup.procs/threads

Hello,

On Thu, Sep 04, 2025 at 07:39:32PM +0800, Yi Tao wrote:
...
> To avoid affecting other users, the per-thread-group rwsem is only used
> when the `favordynmods` flag is enabled.

Can you please:

- Note that this isn't necessary for cgroup2's recommended workflow and is
  thus gated behind favordynmods.

- Include performance numbers briefly.

> +extern bool have_favordynmods;
> +
>  /**
>   * cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups
>   * @tsk: target task
> @@ -838,6 +840,8 @@ struct cgroup_of_peak {
>  static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
>  {
>  	percpu_down_read(&cgroup_threadgroup_rwsem);
> +	if (have_favordynmods)
> +		down_read(&tsk->signal->group_rwsem);
>  }
>  
>  /**
> @@ -848,6 +852,8 @@ static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
>   */
>  static inline void cgroup_threadgroup_change_end(struct task_struct *tsk)
>  {
> +	if (have_favordynmods)
> +		up_read(&tsk->signal->group_rwsem);
>  	percpu_up_read(&cgroup_threadgroup_rwsem);

Hmm... I wonder whether turning on/off the flag is racy. ie. what prevents
have_favordynmods flipping while a task is between change_begin and end?

> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 1ef1edbaaf79..86fbc99a9174 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -226,6 +226,10 @@ struct signal_struct {
>  	struct tty_audit_buf *tty_audit_buf;
>  #endif
>  
> +#ifdef CONFIG_CGROUPS
> +	struct rw_semaphore group_rwsem;
> +#endif

Maybe name it more specific - e.g. cgroup_threadgroup_rwsem?

>  /**
>   * cgroup_attach_lock - Lock for ->attach()
> - * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
> + * @tsk: thread group to lock
> + * @lock_threadgroup: whether to down_write rwsem
>   *
>   * cgroup migration sometimes needs to stabilize threadgroups against forks and
>   * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
> @@ -2480,21 +2481,30 @@ EXPORT_SYMBOL_GPL(cgroup_path_ns);
>   * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
>   * CPU hotplug is disabled on entry.
>   */

Please expand the function comment to explain what's going on and why and
maybe point to it from a comment on top of favor_dynmods.

> -void cgroup_attach_lock(bool lock_threadgroup)
> +void cgroup_attach_lock(struct task_struct *tsk, bool lock_threadgroup)

As @tsk is an optional argument, it'd probably make more sense to put it at
the end.

> @@ -3010,15 +3008,27 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>  	 */
>  	if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
>  		tsk = ERR_PTR(-EINVAL);
> -		goto out_unlock_threadgroup;
> +		goto out_unlock_rcu;
>  	}
> -
>  	get_task_struct(tsk);
> -	goto out_unlock_rcu;
>  
> -out_unlock_threadgroup:
> -	cgroup_attach_unlock(*threadgroup_locked);
> -	*threadgroup_locked = false;
> +	rcu_read_unlock();
> +
> +	/*
> +	 * If we migrate a single thread, we don't care about threadgroup
> +	 * stability. If the thread is `current`, it won't exit(2) under our
> +	 * hands or change PID through exec(2). We exclude
> +	 * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
> +	 * callers by cgroup_mutex.
> +	 * Therefore, we can skip the global lock.
> +	 */
> +	lockdep_assert_held(&cgroup_mutex);
> +	*threadgroup_locked = pid || threadgroup;
> +
> +	cgroup_attach_lock(tsk, *threadgroup_locked);

I'm not sure this relocation is safe. What prevents e.g. @tsk changing its
group leader or signal struct before lock is grabbed?

Thanks.

-- 
tejun