linux-kernel - Re: [PATCH v9 00/13] unwind_user: x86: Deferred unwinding infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250520195549.17f6c2c7@gandalf.local.home>
Date: Tue, 20 May 2025 19:55:49 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Namhyung Kim <namhyung@...nel.org>, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, bpf@...r.kernel.org, x86@...nel.org,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Josh Poimboeuf
 <jpoimboe@...nel.org>, Peter Zijlstra <peterz@...radead.org>, Ingo Molnar
 <mingo@...nel.org>, Jiri Olsa <jolsa@...nel.org>, Thomas Gleixner
 <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>, Dave Hansen
 <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, Andrii
 Nakryiko <andrii@...nel.org>
Subject: Re: [PATCH v9 00/13] unwind_user: x86: Deferred unwinding
 infrastructure

On Wed, 21 May 2025 08:26:05 +0900
Masami Hiramatsu (Google) <mhiramat@...nel.org> wrote:

> > Maybe I asked this before but I don't remember if I got the answer. :)
> > How does it handle task exits as it won't go to userspace?  I guess it'll
> > lose user callstacks for exit syscalls and other termination paths.

I just checked, and the good news is that task_work does indeed get called
when a task exits. The bad news is that it happens after do_exit() cleans
up the task's "mm" structure via exit_mm(). Which means that current->mm is
NULL :-p

There's a proposal to move trace_sched_process_exit() to before exit_mm().
If that happens, we could make that tracepoint a "faultable" tracepoint and
then the unwind infrastructure could attach to it and do the unwinding from
that tracepoint.

> > 
> > Similarly, it will miss user callstacks in the samples at the end of
> > profiling if the target tasks remain in the kernel (or they sleep).
> > It looks like a fundamental limitation of the deferred callchains.  

Yes that is a limitation.

> 
> Can we use a hybrid approach for this case?
> It might be more balanced (from the performance point of view) to save
> the full stack in a classic way only in this case, rather than faulting
> on process exit or doing file access just to load the sframe.

Another approach is that the tool (like perf) could request to take the
user space stack trace every time a task enters the kernel via a system
call.

-- Steve