[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <776b842b-b19f-44bf-bc34-ac756fce7466@efficios.com>
Date: Mon, 19 Feb 2024 13:01:16 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Steven Rostedt <rostedt@...dmis.org>, Oleg Nesterov <oleg@...hat.com>
Cc: wenyang.linux@...mail.com, Masami Hiramatsu <mhiramat@...nel.org>,
Ingo Molnar <mingo@...nel.org>, Mel Gorman <mgorman@...hsingularity.net>,
Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] coredump debugging: add a tracepoint to report the
coredumping
On 2024-02-19 12:28, Steven Rostedt wrote:
> On Mon, 19 Feb 2024 18:00:38 +0100
> Oleg Nesterov <oleg@...hat.com> wrote:
>
>>> void __noreturn do_exit(long code)
>>> {
>>> struct task_struct *tsk = current;
>>> int group_dead;
>>>
>>> [...]
>>> acct_collect(code, group_dead);
>>> if (group_dead)
>>> tty_audit_exit();
>>> audit_free(tsk);
>>>
>>> tsk->exit_code = code;
>>> taskstats_exit(tsk, group_dead);
>>>
>>> exit_mm();
>>>
>>> if (group_dead)
>>> acct_process();
>>> trace_sched_process_exit(tsk);
>>>
>>> There's a lot that happens before we trigger the above event.
>>
>> and a lot after.
>
> True. There really isn't a meaningful location here is there?
>
> I actually use this tracepoint in my pid tracing.
>
> The set_ftrace_pid and set_event_pid from /sys/kernel/tracing will add and
> remove PIDs if the options function-fork or event-fork are set respectively.
>
> I hook to the sched_process_fork tracepoint to add new PIDs if the parent
> pid is already in one of the files, and remove a PID via the
> sched_process_exit function.
No ? Those hook on sched_process_free, which is the actual point where the
task is freed (AFAIR after it's been a zombie and then waited for by another
task).
kernel/trace/trace_events.c:
void trace_event_follow_fork(struct trace_array *tr, bool enable)
{
if (enable) {
register_trace_prio_sched_process_fork(event_filter_pid_sched_process_fork,
tr, INT_MIN);
register_trace_prio_sched_process_free(event_filter_pid_sched_process_exit,
tr, INT_MAX);
} else {
unregister_trace_sched_process_fork(event_filter_pid_sched_process_fork,
tr);
unregister_trace_sched_process_free(event_filter_pid_sched_process_exit,
tr);
}
}
kernel/trace/ftrace.c:
void ftrace_pid_follow_fork(struct trace_array *tr, bool enable)
{
if (enable) {
register_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
tr);
register_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
tr);
} else {
unregister_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
tr);
unregister_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
tr);
}
}
AFAIU, "sched_process_exit" is issued close to the point where the task exits
(it should not go back to userspace after that). "sched_process_free" is done
when the task is really being removed.
Between "sched_process_exit" and "sched_process_free", the task can still be
observed by a trace analysis looking at sched and signal events: it's a zombie at
that stage.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists