[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aRdyZZ9xHk5dLQxG@slm.duckdns.org>
Date: Fri, 14 Nov 2025 08:18:13 -1000
From: Tejun Heo <tj@...nel.org>
To: Michal Koutný <mkoutny@...e.com>
Cc: David Vernet <void@...ifault.com>, Andrea Righi <arighi@...dia.com>,
Changwoo Min <changwoo@...lia.com>,
Dan Schatzberg <dschatzberg@...a.com>,
Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, sched-ext@...ts.linux.dev
Subject: Re: [PATCH 2/4] cgroup: Move dying_tasks cleanup from
cgroup_task_release() to cgroup_task_free()
Hello,
On Fri, Nov 14, 2025 at 06:48:17PM +0100, Michal Koutný wrote:
> On Tue, Oct 28, 2025 at 08:19:16PM -1000, Tejun Heo <tj@...nel.org> wrote:
> > An upcoming patch will defer the dying_tasks list addition, moving it from
> > cgroup_task_exit() (called from do_exit()) to a new function called from
> > finish_task_switch().
> > However, release_task() (which calls
> > cgroup_task_release()) can run either before or after finish_task_switch(),
>
> Just for better understanding -- when can release_task() run before
> finish_task_switch()?
I didn't test explicitly, so please take it with a grain of salt, but I
think both autoreap and !autoreap cases can run before the final task
switch.
- When autoreap, the dying task calls exit_notify() and eventually calls
release_task() on self. This is obviously before the final switch.
- When !autoreap, it's a race. After exit_notify(), the parent can wait the
zombie task anytime which will call release_task() through
wait_task_zombie(). This can happen either before or after
finish_task_switch().
> > creating a race where cgroup_task_release() might try to remove the task from
> > dying_tasks before or while it's being added.
> >
> > Move the list_del_init() from cgroup_task_release() to cgroup_task_free() to
> > fix this race. cgroup_task_free() runs from __put_task_struct(), which is
> > always after both paths, making the cleanup safe.
>
> (Ah, now I get the reasoning of more likely pids '0' for CSS_TASK_ITER_PROCS.)
Yeah, I thought about filtering it out better but if we can already show 0
pid for foreign ns tasks, maybe this is okay. What do you think?
Thanks.
--
tejun
Powered by blists - more mailing lists