linux-kernel - Re: [PATCH v2 1/1] cgroup: replace global percpu_rwsem with signal_struct->group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aLpKr6_r5exdc3EQ@slm.duckdns.org>
Date: Thu, 4 Sep 2025 16:27:59 -1000
From: Tejun Heo <tj@...nel.org>
To: escape <escape@...ux.alibaba.com>
Cc: hannes@...xchg.org, mkoutny@...e.com, cgroups@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] cgroup: replace global percpu_rwsem with
 signal_struct->group_rwsem when writing cgroup.procs/threads

Hello,

On Fri, Sep 05, 2025 at 10:16:30AM +0800, escape wrote:
> > > +	if (have_favordynmods)
> > > +		up_read(&tsk->signal->group_rwsem);
> > >   	percpu_up_read(&cgroup_threadgroup_rwsem);
> > Hmm... I wonder whether turning on/off the flag is racy. ie. what prevents
> > have_favordynmods flipping while a task is between change_begin and end?
>
> have_favordynmods is read-only after initialization and will not change
> during runtime.

I don't think that's necessarily true. favordynmods can also be specified as
a mount option and mount can race against forks, execs and exits. Also,
IIRC, cgroup2 doesn't allow remounts but there's nothing preventing someone
from unmounting and mounting it again with different options.

> > > @@ -3010,15 +3008,27 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
> > >   	 */
> > >   	if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
> > >   		tsk = ERR_PTR(-EINVAL);
> > > -		goto out_unlock_threadgroup;
> > > +		goto out_unlock_rcu;
> > >   	}
> > > -
> > >   	get_task_struct(tsk);
> > > -	goto out_unlock_rcu;
> > > -out_unlock_threadgroup:
> > > -	cgroup_attach_unlock(*threadgroup_locked);
> > > -	*threadgroup_locked = false;
> > > +	rcu_read_unlock();
> > > +
> > > +	/*
> > > +	 * If we migrate a single thread, we don't care about threadgroup
> > > +	 * stability. If the thread is `current`, it won't exit(2) under our
> > > +	 * hands or change PID through exec(2). We exclude
> > > +	 * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
> > > +	 * callers by cgroup_mutex.
> > > +	 * Therefore, we can skip the global lock.
> > > +	 */
> > > +	lockdep_assert_held(&cgroup_mutex);
> > > +	*threadgroup_locked = pid || threadgroup;
> > > +
> > > +	cgroup_attach_lock(tsk, *threadgroup_locked);
> > I'm not sure this relocation is safe. What prevents e.g. @tsk changing its
> > group leader or signal struct before lock is grabbed?
>
> When a non-leader thread in a thread group executes the exec system call,
> the thread group leader is updated, but the signal_struct remains unchanged,
> so this part is safe.

But the leader can change, right? So, we can end up in a situation where
threadgroup is set but the task is not the leader which I think can lead to
really subtle incorrect behaviors like write succeeding but nothing
happening when racing against exec.

Thanks.

-- 
tejun