lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-6TDh1MUT49lvjk@gmail.com>
Date: Thu, 3 Apr 2025 15:54:22 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Andrii Nakryiko <andrii@...nel.org>
Cc: linux-trace-kernel@...r.kernel.org, peterz@...radead.org,
	bpf@...r.kernel.org, linux-kernel@...r.kernel.org,
	kernel-team@...a.com, mhocko@...nel.org, rostedt@...dmis.org,
	oleg@...hat.com, brauner@...nel.org, glider@...gle.com,
	mhiramat@...nel.org, mathieu.desnoyers@...icios.com,
	akpm@...ux-foundation.org
Subject: Re: [PATCH v2] exit: move and extend sched_process_exit() tracepoint


* Andrii Nakryiko <andrii@...nel.org> wrote:

> It is useful to be able to access current->mm at task exit to, say,
> record a bunch of VMA information right before the task exits (e.g., for
> stack symbolization reasons when dealing with short-lived processes that
> exit in the middle of profiling session). Currently,
> trace_sched_process_exit() is triggered after exit_mm() which resets
> current->mm to NULL making this tracepoint unsuitable for inspecting
> and recording task's mm_struct-related data when tracing process
> lifetimes.
> 
> There is a particularly suitable place, though, right after
> taskstats_exit() is called, but before we do exit_mm() and other
> exit_*() resource teardowns. taskstats performs a similar kind of
> accounting that some applications do with BPF, and so co-locating them
> seems like a good fit. So that's where trace_sched_process_exit() is
> moved with this patch.
> 
> Also, existing trace_sched_process_exit() tracepoint is notoriously
> missing `group_dead` flag that is certainly useful in practice and some
> of our production applications have to work around this. So plumb
> `group_dead` through while at it, to have a richer and more complete
> tracepoint.
> 
> Note that we can't use sched_process_template anymore, and so we use
> TRACE_EVENT()-based tracepoint definition.

 But all the field names and
> order, as well as assign and output logic remain intact. We just add one
> extra field at the end in backwards-compatible way.
> 
> Signed-off-by: Andrii Nakryiko <andrii@...nel.org>
> ---
>  include/trace/events/sched.h | 28 +++++++++++++++++++++++++---
>  kernel/exit.c                |  2 +-
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 8994e97d86c1..05a14f2b35c3 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -328,9 +328,31 @@ DEFINE_EVENT(sched_process_template, sched_process_free,
>  /*
>   * Tracepoint for a task exiting:
>   */
> -DEFINE_EVENT(sched_process_template, sched_process_exit,
> -	     TP_PROTO(struct task_struct *p),
> -	     TP_ARGS(p));
> +TRACE_EVENT(sched_process_exit,
> +
> +	TP_PROTO(struct task_struct *p, bool group_dead),
> +
> +	TP_ARGS(p, group_dead),
> +
> +	TP_STRUCT__entry(
> +		__array(	char,	comm,	TASK_COMM_LEN	)
> +		__field(	pid_t,	pid			)
> +		__field(	int,	prio			)
> +		__field(	bool,	group_dead		)
> +	),
> +
> +	TP_fast_assign(
> +		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
> +		__entry->pid		= p->pid;
> +		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
> +		__entry->group_dead	= group_dead;
> +	),
> +
> +	TP_printk("comm=%s pid=%d prio=%d group_dead=%s",
> +		  __entry->comm, __entry->pid, __entry->prio,
> +		  __entry->group_dead ? "true" : "false"
> +	)

This feels really fragile, could you please at least add a comment that 
points out that this is basically an extension of 
sched_process_template, and that it should remain a subset of it, or 
something to that end?

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ