[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93b146e5-08e4-4c10-bb90-8149e82263f0@linux.alibaba.com>
Date: Fri, 5 Sep 2025 11:44:53 +0800
From: escape <escape@...ux.alibaba.com>
To: Tejun Heo <tj@...nel.org>
Cc: hannes@...xchg.org, mkoutny@...e.com, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] cgroup: replace global percpu_rwsem with
signal_struct->group_rwsem when writing cgroup.procs/threads
Hello,
在 2025/9/5 10:27, Tejun Heo 写道:
> Hello,
>
> On Fri, Sep 05, 2025 at 10:16:30AM +0800, escape wrote:
>>>> + if (have_favordynmods)
>>>> + up_read(&tsk->signal->group_rwsem);
>>>> percpu_up_read(&cgroup_threadgroup_rwsem);
>>> Hmm... I wonder whether turning on/off the flag is racy. ie. what prevents
>>> have_favordynmods flipping while a task is between change_begin and end?
>> have_favordynmods is read-only after initialization and will not change
>> during runtime.
> I don't think that's necessarily true. favordynmods can also be specified as
> a mount option and mount can race against forks, execs and exits. Also,
> IIRC, cgroup2 doesn't allow remounts but there's nothing preventing someone
> from unmounting and mounting it again with different options.
There are two ways to set favordynmods. The first is through kernel
config or
the kernel command line parameter cgroup_favordynmods, which sets the
value of
the variable have_favordynmods and automatically sets the
CGRP_ROOT_FAVOR_DYNMODS
flag on cgroup_root->flags at mount time. The second is by passing a
mount option,
which only sets the CGRP_ROOT_FAVOR_DYNMODS flag on cgroup_root->flags
during mount.
In this patch, the decision whether to use the per-threadgroup rwsem is
based on
have_favordynmods, not on cgroup_root->flags. An umount & mount affects
cgroup_root->flags,
but does not change have_favordynmods.
Indeed, there is a minor flaw here: if favordynmods is enabled via a
mount option rather
than the kernel command line, the per-threadgroup rwsem will not take
effect.
To avoid the complexity of runtime changes to favordynmods, perhaps a
better approach
would be to introduce a separate control variable, configured via a
kernel command line
parameter such as cgroup_migration_lock=[global|thread_group] to
explicitly govern
this behavior. What do you think about this approach?
>>>> @@ -3010,15 +3008,27 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>>>> */
>>>> if (tsk->no_cgroup_migration || (tsk->flags & PF_NO_SETAFFINITY)) {
>>>> tsk = ERR_PTR(-EINVAL);
>>>> - goto out_unlock_threadgroup;
>>>> + goto out_unlock_rcu;
>>>> }
>>>> -
>>>> get_task_struct(tsk);
>>>> - goto out_unlock_rcu;
>>>> -out_unlock_threadgroup:
>>>> - cgroup_attach_unlock(*threadgroup_locked);
>>>> - *threadgroup_locked = false;
>>>> + rcu_read_unlock();
>>>> +
>>>> + /*
>>>> + * If we migrate a single thread, we don't care about threadgroup
>>>> + * stability. If the thread is `current`, it won't exit(2) under our
>>>> + * hands or change PID through exec(2). We exclude
>>>> + * cgroup_update_dfl_csses and other cgroup_{proc,thread}s_write
>>>> + * callers by cgroup_mutex.
>>>> + * Therefore, we can skip the global lock.
>>>> + */
>>>> + lockdep_assert_held(&cgroup_mutex);
>>>> + *threadgroup_locked = pid || threadgroup;
>>>> +
>>>> + cgroup_attach_lock(tsk, *threadgroup_locked);
>>> I'm not sure this relocation is safe. What prevents e.g. @tsk changing its
>>> group leader or signal struct before lock is grabbed?
>> When a non-leader thread in a thread group executes the exec system call,
>> the thread group leader is updated, but the signal_struct remains unchanged,
>> so this part is safe.
> But the leader can change, right? So, we can end up in a situation where
> threadgroup is set but the task is not the leader which I think can lead to
> really subtle incorrect behaviors like write succeeding but nothing
> happening when racing against exec.
>
> Thanks.
>
You are right. If the thread group leader changes, the task passed to
cgroup_attach_task may not be the leader, which could lead to errors when
iterating through all threads in while_each_thread.
Similar to commit b78949ebfb56 ("cgroup: simplify double-check locking in
cgroup_attach_proc"), it should check whether the leader has changed after
acquiring the lock, to determine if a retry is needed. I will fix this in
the next version patch.
Thanks.
Powered by blists - more mailing lists