linux-kernel - Re: [PATCH] exit: add trace_task_exit() tracepoint before current->mm is reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250401173249.42d43a28@gandalf.local.home>
Date: Tue, 1 Apr 2025 17:32:49 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Andrii Nakryiko <andrii@...nel.org>
Cc: linux-trace-kernel@...r.kernel.org, peterz@...radead.org,
 mingo@...nel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel-team@...a.com, mhocko@...nel.org, oleg@...hat.com,
 brauner@...nel.org, glider@...gle.com, mhiramat@...nel.org,
 mathieu.desnoyers@...icios.com, akpm@...ux-foundation.org
Subject: Re: [PATCH] exit: add trace_task_exit() tracepoint before
 current->mm is reset

On Tue,  1 Apr 2025 11:40:21 -0700
Andrii Nakryiko <andrii@...nel.org> wrote:

Hi Andrii,

> It is useful to be able to access current->mm to, say, record a bunch of
> VMA information right before the task exits (e.g., for stack
> symbolization reasons when dealing with short-lived processes that exit
> in the middle of profiling session). We currently do have
> trace_sched_process_exit() in the exit path, but it is called a bit too
> late, after exit_mm() resets current->mm to NULL, which makes it
> unsuitable for inspecting and recording task's mm_struct-related data
> when tracing process lifetimes.

My fear of adding another task exit trace event is that it will get a
bit confusing as that we now have trace_sched_process_exit() and also
trace_task_exit() with slightly different semantics.

How about adding a trace_exit_mm()? Add that to the exit_mm() code?

static void exit_mm(void)
{
	struct mm_struct *mm = current->mm;

	exit_mm_release(current, mm);
	trace_exit_mm(mm);

??

> 
> There is a particularly suitable place, though, right after
> taskstats_exit() is called, but before we do exit_mm(). taskstats
> performs a similar kind of accounting that some applications do with
> BPF, and so co-locating them seems like a good fit.
> 
> Moving trace_sched_process_exit() a bit earlier would solve this problem
> as well, and I'm open to that. But this might potentially change its
> semantics a little, and so instead of risking that, I went for adding
> a new trace_task_exit() tracepoint instead. Tracepoints have zero
> overhead at runtime, unless actively traced, so this seems acceptable.
> 
> Also, existing trace_sched_process_exit() tracepoint is notoriously
> missing `group_dead` flag that is certainly useful in practice and some
> of our production applications have to work around this. So plumb
> `group_dead` through while at it, to have a richer and more complete
> tracepoint.

There should be no problem adding group_dead to the
trace_sched_process_exit() trace event. Adding fields should never cause
any user API breakage.

-- Steve