linux-kernel - Re: [PATCH] coredump debugging: add a tracepoint to report the coredumping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <776b842b-b19f-44bf-bc34-ac756fce7466@efficios.com>
Date: Mon, 19 Feb 2024 13:01:16 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Steven Rostedt <rostedt@...dmis.org>, Oleg Nesterov <oleg@...hat.com>
Cc: wenyang.linux@...mail.com, Masami Hiramatsu <mhiramat@...nel.org>,
 Ingo Molnar <mingo@...nel.org>, Mel Gorman <mgorman@...hsingularity.net>,
 Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] coredump debugging: add a tracepoint to report the
 coredumping

On 2024-02-19 12:28, Steven Rostedt wrote:
> On Mon, 19 Feb 2024 18:00:38 +0100
> Oleg Nesterov <oleg@...hat.com> wrote:
> 
>>> void __noreturn do_exit(long code)
>>> {
>>> 	struct task_struct *tsk = current;
>>> 	int group_dead;
>>>
>>> [...]
>>> 	acct_collect(code, group_dead);
>>> 	if (group_dead)
>>> 		tty_audit_exit();
>>> 	audit_free(tsk);
>>>
>>> 	tsk->exit_code = code;
>>> 	taskstats_exit(tsk, group_dead);
>>>
>>> 	exit_mm();
>>>
>>> 	if (group_dead)
>>> 		acct_process();
>>> 	trace_sched_process_exit(tsk);
>>>
>>> There's a lot that happens before we trigger the above event.
>>
>> and a lot after.
> 
> True. There really isn't a meaningful location here is there?
> 
> I actually use this tracepoint in my pid tracing.
> 
> The set_ftrace_pid and set_event_pid from /sys/kernel/tracing will add and
> remove PIDs if the options function-fork or event-fork are set respectively.
> 
> I hook to the sched_process_fork tracepoint to add new PIDs if the parent
> pid is already in one of the files, and remove a PID via the
> sched_process_exit function.

No ? Those hook on sched_process_free, which is the actual point where the
task is freed (AFAIR after it's been a zombie and then waited for by another
task).

kernel/trace/trace_events.c:

void trace_event_follow_fork(struct trace_array *tr, bool enable)
{
         if (enable) {
                 register_trace_prio_sched_process_fork(event_filter_pid_sched_process_fork,
                                                        tr, INT_MIN);
                 register_trace_prio_sched_process_free(event_filter_pid_sched_process_exit,
                                                        tr, INT_MAX);
         } else {
                 unregister_trace_sched_process_fork(event_filter_pid_sched_process_fork,
                                                     tr);
                 unregister_trace_sched_process_free(event_filter_pid_sched_process_exit,
                                                     tr);
         }
}

kernel/trace/ftrace.c:

void ftrace_pid_follow_fork(struct trace_array *tr, bool enable)
{
         if (enable) {
                 register_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
                                                   tr);
                 register_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
                                                   tr);
         } else {
                 unregister_trace_sched_process_fork(ftrace_pid_follow_sched_process_fork,
                                                     tr);
                 unregister_trace_sched_process_free(ftrace_pid_follow_sched_process_exit,
                                                     tr);
         }
}

AFAIU, "sched_process_exit" is issued close to the point where the task exits
(it should not go back to userspace after that). "sched_process_free" is done
when the task is really being removed.

Between "sched_process_exit" and "sched_process_free", the task can still be
observed by a trace analysis looking at sched and signal events: it's a zombie at
that stage.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com