lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aRdyZZ9xHk5dLQxG@slm.duckdns.org>
Date: Fri, 14 Nov 2025 08:18:13 -1000
From: Tejun Heo <tj@...nel.org>
To: Michal Koutný <mkoutny@...e.com>
Cc: David Vernet <void@...ifault.com>, Andrea Righi <arighi@...dia.com>,
	Changwoo Min <changwoo@...lia.com>,
	Dan Schatzberg <dschatzberg@...a.com>,
	Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
	cgroups@...r.kernel.org, sched-ext@...ts.linux.dev
Subject: Re: [PATCH 2/4] cgroup: Move dying_tasks cleanup from
 cgroup_task_release() to cgroup_task_free()

Hello,

On Fri, Nov 14, 2025 at 06:48:17PM +0100, Michal Koutný wrote:
> On Tue, Oct 28, 2025 at 08:19:16PM -1000, Tejun Heo <tj@...nel.org> wrote:
> > An upcoming patch will defer the dying_tasks list addition, moving it from
> > cgroup_task_exit() (called from do_exit()) to a new function called from
> > finish_task_switch().
> > However, release_task() (which calls
> > cgroup_task_release()) can run either before or after finish_task_switch(),
> 
> Just for better understanding -- when can release_task() run before
> finish_task_switch()?

I didn't test explicitly, so please take it with a grain of salt, but I
think both autoreap and !autoreap cases can run before the final task
switch.

- When autoreap, the dying task calls exit_notify() and eventually calls
  release_task() on self. This is obviously before the final switch.

- When !autoreap, it's a race. After exit_notify(), the parent can wait the
  zombie task anytime which will call release_task() through
  wait_task_zombie(). This can happen either before or after
  finish_task_switch().

> > creating a race where cgroup_task_release() might try to remove the task from
> > dying_tasks before or while it's being added.
> > 
> > Move the list_del_init() from cgroup_task_release() to cgroup_task_free() to
> > fix this race. cgroup_task_free() runs from __put_task_struct(), which is
> > always after both paths, making the cleanup safe.
> 
> (Ah, now I get the reasoning of more likely pids '0' for CSS_TASK_ITER_PROCS.)

Yeah, I thought about filtering it out better but if we can already show 0
pid for foreign ns tasks, maybe this is okay. What do you think?

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ