[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11edd1da-7162-4f5a-b909-72c2f65e9db7@linux.alibaba.com>
Date: Fri, 5 Sep 2025 10:16:30 +0800
From: escape <escape@...ux.alibaba.com>
To: Tejun Heo <tj@...nel.org>
Cc: hannes@...xchg.org, mkoutny@...e.com, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] cgroup: replace global percpu_rwsem with
signal_struct->group_rwsem when writing cgroup.procs/threads
Thanks for your review.
在 2025/9/5 00:31, Tejun Heo 写道:
> Hello,
>
> On Thu, Sep 04, 2025 at 07:39:32PM +0800, Yi Tao wrote:
> ...
>> To avoid affecting other users, the per-thread-group rwsem is only used
>> when the `favordynmods` flag is enabled.
> Can you please:
>
> - Note that this isn't necessary for cgroup2's recommended workflow and is
> thus gated behind favordynmods.
>
> - Include performance numbers briefly.
>
>> +extern bool have_favordynmods;
>> +
>> /**
>> * cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups
>> * @tsk: target task
>> @@ -838,6 +840,8 @@ struct cgroup_of_peak {
>> static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
>> {
>> percpu_down_read(&cgroup_threadgroup_rwsem);
>> + if (have_favordynmods)
>> + down_read(&tsk->signal->group_rwsem);
>> }
>>
>> /**
>> @@ -848,6 +852,8 @@ static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
>> */
>> static inline void cgroup_threadgroup_change_end(struct task_struct *tsk)
>> {
>> + if (have_favordynmods)
>> + up_read(&tsk->signal->group_rwsem);
>> percpu_up_read(&cgroup_threadgroup_rwsem);
> Hmm... I wonder whether turning on/off the flag is racy. ie. what prevents
> have_favordynmods flipping while a task is between change_begin and end?
have_favordynmods is read-only after initialization and will not change
during runtime.
>> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> index 1ef1edbaaf79..86fbc99a9174 100644
>> --- a/include/linux/sched/signal.h
>> +++ b/include/linux/sched/signal.h
>> @@ -226,6 +226,10 @@ struct signal_struct {
>> struct tty_audit_buf *tty_audit_buf;
>> #endif
>>
>> +#ifdef CONFIG_CGROUPS
>> + struct rw_semaphore group_rwsem;
>> +#endif
> Maybe name it more specific - e.g. cgroup_threadgroup_rwsem?
>
>> /**
>> * cgroup_attach_lock - Lock for ->attach()
>> - * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
>> + * @tsk: thread group to lock
>> + * @lock_threadgroup: whether to down_write rwsem
>> *
>> * cgroup migration sometimes needs to stabilize threadgroups against forks and
>> * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
>> @@ -2480,21 +2481,30 @@ EXPORT_SYMBOL_GPL(cgroup_path_ns);
>> * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
>> * CPU hotplug is disabled on entry.
>> */
> Please expand the function comment to explain what's going on and why and
> maybe point to it from a comment on top of favor_dynmods.
>
>> -void cgroup_attach_lock(bool lock_threadgroup)
>> +void cgroup_attach_lock(struct task_struct *tsk, bool lock_threadgroup)
> As @tsk is an optional argument, it'd probably make more sense to put it at
> the end.
>
>> @@ -3010,15 +3008,27 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>> */
>> if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
>> tsk = ERR_PTR(-EINVAL);
>> - goto out_unlock_threadgroup;
>> + goto out_unlock_rcu;
>> }
>> -
>> get_task_struct(tsk);
>> - goto out_unlock_rcu;
>>
>> -out_unlock_threadgroup:
>> - cgroup_attach_unlock(*threadgroup_locked);
>> - *threadgroup_locked = false;
>> + rcu_read_unlock();
>> +
>> + /*
>> + * If we migrate a single thread, we don't care about threadgroup
>> + * stability. If the thread is `current`, it won't exit(2) under our
>> + * hands or change PID through exec(2). We exclude
>> + * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
>> + * callers by cgroup_mutex.
>> + * Therefore, we can skip the global lock.
>> + */
>> + lockdep_assert_held(&cgroup_mutex);
>> + *threadgroup_locked = pid || threadgroup;
>> +
>> + cgroup_attach_lock(tsk, *threadgroup_locked);
> I'm not sure this relocation is safe. What prevents e.g. @tsk changing its
> group leader or signal struct before lock is grabbed?
When a non-leader thread in a thread group executes the exec system call,
the thread group leader is updated, but the signal_struct remains unchanged,
so this part is safe.
>
> Thanks.
I will accept the remaining comments and apply them in the next version of
the patch.
Thanks.
Powered by blists - more mailing lists