linux-kernel - Re: [PATCH v5 2/3] cgroup: relocate cgroup_attach_lock within cgroup_procs_write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <b547fd22-4363-403a-a427-c20526fcf063@redhat.com>
Date: Wed, 10 Sep 2025 23:22:03 -0400
From: Waiman Long <llong@...hat.com>
To: Yi Tao <escape@...ux.alibaba.com>, tj@...nel.org, hannes@...xchg.org,
 mkoutny@...e.com
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 2/3] cgroup: relocate cgroup_attach_lock within
 cgroup_procs_write_start

On 9/10/25 2:59 AM, Yi Tao wrote:
> Later patches will introduce a new parameter `task` to
> cgroup_attach_lock, thus adjusting the position of cgroup_attach_lock
> within cgroup_procs_write_start.
>
> Between obtaining the threadgroup leader via PID and acquiring the
> cgroup attach lock, the threadgroup leader may change, which could lead
> to incorrect cgroup migration. Therefore, after acquiring the cgroup
> attach lock, we check whether the threadgroup leader has changed, and if
> so, retry the operation.
>
> Signed-off-by: Yi Tao <escape@...ux.alibaba.com>
> ---
>   kernel/cgroup/cgroup.c | 61 ++++++++++++++++++++++++++----------------
>   1 file changed, 38 insertions(+), 23 deletions(-)
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 2b88c7abaa00..756807164091 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -2994,29 +2994,13 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>   	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
>   		return ERR_PTR(-EINVAL);
>   
> -	/*
> -	 * If we migrate a single thread, we don't care about threadgroup
> -	 * stability. If the thread is `current`, it won't exit(2) under our
> -	 * hands or change PID through exec(2). We exclude
> -	 * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
> -	 * callers by cgroup_mutex.
> -	 * Therefore, we can skip the global lock.
> -	 */
> -	lockdep_assert_held(&cgroup_mutex);
> -
> -	if (pid || threadgroup)
> -		*lock_mode = CGRP_ATTACH_LOCK_GLOBAL;
> -	else
> -		*lock_mode = CGRP_ATTACH_LOCK_NONE;
> -
> -	cgroup_attach_lock(*lock_mode);
> -
> +retry_find_task:
>   	rcu_read_lock();
>   	if (pid) {
>   		tsk = find_task_by_vpid(pid);
>   		if (!tsk) {
>   			tsk = ERR_PTR(-ESRCH);
> -			goto out_unlock_threadgroup;
> +			goto out_unlock_rcu;
>   		}
>   	} else {
>   		tsk = current;
> @@ -3033,15 +3017,46 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>   	 */
>   	if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
>   		tsk = ERR_PTR(-EINVAL);
> -		goto out_unlock_threadgroup;
> +		goto out_unlock_rcu;
>   	}
>   
>   	get_task_struct(tsk);
> -	goto out_unlock_rcu;
> +	rcu_read_unlock();
> +
> +	/*
> +	 * If we migrate a single thread, we don't care about threadgroup
> +	 * stability. If the thread is `current`, it won't exit(2) under our
> +	 * hands or change PID through exec(2). We exclude
> +	 * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
> +	 * callers by cgroup_mutex.
> +	 * Therefore, we can skip the global lock.
> +	 */
> +	lockdep_assert_held(&cgroup_mutex);
> +
> +	if (pid || threadgroup)
> +		*lock_mode = CGRP_ATTACH_LOCK_GLOBAL;
> +	else
> +		*lock_mode = CGRP_ATTACH_LOCK_NONE;
> +
> +	cgroup_attach_lock(*lock_mode);
> +
> +	if (threadgroup) {
> +		if (!thread_group_leader(tsk)) {
Nit: You can combine the 2 conditions together to avoid excessive indent.

  if (threadgroup && !thread_group_leader(tsk)) {

> +			/*
> +			 * a race with de_thread from another thread's exec()
Should be "de_thread()" to signal that it is a function.
> +			 * may strip us of our leadership, if this happens,
> +			 * there is no choice but to throw this task away and
> +			 * try again; this is
> +			 * "double-double-toil-and-trouble-check locking".

This "double-double-toil-and-trouble-check" is a new term in the kernel 
source tree. I will suggest to use something simpler to avoid confusion.

Cheers, Longman