linux-kernel - Re: [PATCH v2 1/1] cgroup: replace global percpu_rwsem with signal_struct->group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <11edd1da-7162-4f5a-b909-72c2f65e9db7@linux.alibaba.com>
Date: Fri, 5 Sep 2025 10:16:30 +0800
From: escape <escape@...ux.alibaba.com>
To: Tejun Heo <tj@...nel.org>
Cc: hannes@...xchg.org, mkoutny@...e.com, cgroups@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] cgroup: replace global percpu_rwsem with
 signal_struct->group_rwsem when writing cgroup.procs/threads

Thanks for your review.

在 2025/9/5 00:31, Tejun Heo 写道:
> Hello,
>
> On Thu, Sep 04, 2025 at 07:39:32PM +0800, Yi Tao wrote:
> ...
>> To avoid affecting other users, the per-thread-group rwsem is only used
>> when the `favordynmods` flag is enabled.
> Can you please:
>
> - Note that this isn't necessary for cgroup2's recommended workflow and is
>    thus gated behind favordynmods.
>
> - Include performance numbers briefly.
>
>> +extern bool have_favordynmods;
>> +
>>   /**
>>    * cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups
>>    * @tsk: target task
>> @@ -838,6 +840,8 @@ struct cgroup_of_peak {
>>   static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
>>   {
>>   	percpu_down_read(&cgroup_threadgroup_rwsem);
>> +	if (have_favordynmods)
>> +		down_read(&tsk->signal->group_rwsem);
>>   }
>>   
>>   /**
>> @@ -848,6 +852,8 @@ static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
>>    */
>>   static inline void cgroup_threadgroup_change_end(struct task_struct *tsk)
>>   {
>> +	if (have_favordynmods)
>> +		up_read(&tsk->signal->group_rwsem);
>>   	percpu_up_read(&cgroup_threadgroup_rwsem);
> Hmm... I wonder whether turning on/off the flag is racy. ie. what prevents
> have_favordynmods flipping while a task is between change_begin and end?
have_favordynmods is read-only after initialization and will not change 
during runtime.
>> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> index 1ef1edbaaf79..86fbc99a9174 100644
>> --- a/include/linux/sched/signal.h
>> +++ b/include/linux/sched/signal.h
>> @@ -226,6 +226,10 @@ struct signal_struct {
>>   	struct tty_audit_buf *tty_audit_buf;
>>   #endif
>>   
>> +#ifdef CONFIG_CGROUPS
>> +	struct rw_semaphore group_rwsem;
>> +#endif
> Maybe name it more specific - e.g. cgroup_threadgroup_rwsem?
>
>>   /**
>>    * cgroup_attach_lock - Lock for ->attach()
>> - * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
>> + * @tsk: thread group to lock
>> + * @lock_threadgroup: whether to down_write rwsem
>>    *
>>    * cgroup migration sometimes needs to stabilize threadgroups against forks and
>>    * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
>> @@ -2480,21 +2481,30 @@ EXPORT_SYMBOL_GPL(cgroup_path_ns);
>>    * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
>>    * CPU hotplug is disabled on entry.
>>    */
> Please expand the function comment to explain what's going on and why and
> maybe point to it from a comment on top of favor_dynmods.
>
>> -void cgroup_attach_lock(bool lock_threadgroup)
>> +void cgroup_attach_lock(struct task_struct *tsk, bool lock_threadgroup)
> As @tsk is an optional argument, it'd probably make more sense to put it at
> the end.
>
>> @@ -3010,15 +3008,27 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>>   	 */
>>   	if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
>>   		tsk = ERR_PTR(-EINVAL);
>> -		goto out_unlock_threadgroup;
>> +		goto out_unlock_rcu;
>>   	}
>> -
>>   	get_task_struct(tsk);
>> -	goto out_unlock_rcu;
>>   
>> -out_unlock_threadgroup:
>> -	cgroup_attach_unlock(*threadgroup_locked);
>> -	*threadgroup_locked = false;
>> +	rcu_read_unlock();
>> +
>> +	/*
>> +	 * If we migrate a single thread, we don't care about threadgroup
>> +	 * stability. If the thread is `current`, it won't exit(2) under our
>> +	 * hands or change PID through exec(2). We exclude
>> +	 * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
>> +	 * callers by cgroup_mutex.
>> +	 * Therefore, we can skip the global lock.
>> +	 */
>> +	lockdep_assert_held(&cgroup_mutex);
>> +	*threadgroup_locked = pid || threadgroup;
>> +
>> +	cgroup_attach_lock(tsk, *threadgroup_locked);
> I'm not sure this relocation is safe. What prevents e.g. @tsk changing its
> group leader or signal struct before lock is grabbed?
When a non-leader thread in a thread group executes the exec system call,
the thread group leader is updated, but the signal_struct remains unchanged,
so this part is safe.
>
> Thanks.

I will accept the remaining comments and apply them in the next version of
the patch.

Thanks.