linux-kernel - Re: [PATCH v6 5/6] tracing: Show inode and device major:minor in deferred user space stacktrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250829190935.7e014820@gandalf.local.home>
Date: Fri, 29 Aug 2025 19:09:35 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@...il.com>, Steven Rostedt
 <rostedt@...nel.org>, linux-kernel@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, bpf@...r.kernel.org, x86@...nel.org,
 Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers
 <mathieu.desnoyers@...icios.com>, Josh Poimboeuf <jpoimboe@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, Jiri
 Olsa <jolsa@...nel.org>, Arnaldo Carvalho de Melo <acme@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
 Andrii Nakryiko <andrii@...nel.org>, Indu Bhagat <indu.bhagat@...cle.com>,
 "Jose E. Marchesi" <jemarch@....org>, Beau Belgrave
 <beaub@...ux.microsoft.com>, Jens Remus <jremus@...ux.ibm.com>, Andrew
 Morton <akpm@...ux-foundation.org>, Florian Weimer <fweimer@...hat.com>,
 Sam James <sam@...too.org>, Kees Cook <kees@...nel.org>, "Carlos O'Donell"
 <codonell@...hat.com>
Subject: Re: [PATCH v6 5/6] tracing: Show inode and device major:minor in
 deferred user space stacktrace

On Fri, 29 Aug 2025 15:40:07 -0700
Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Fri, 29 Aug 2025 at 14:18, Steven Rostedt <rostedt@...dmis.org> wrote:
> > >
> > > Just do the parsing at parse time. End of story.  
> >
> > What does "parsing at parse time" mean?  
> 
> In user space. When parsing the trace events.
> 
> Not in kernel space, when generating the events.
> 
> Arnaldo already said it was workable.

Perf does do things differently, as I believe it processes the events as it
reads from the kernel (Arnaldo correct me if I'm wrong).

For the tracefs code, the raw data gets saved directly into a file, and the
processing happens after the fact. If a tool is recording, it still needs a
way to know what those hash values mean, after the tracing is complete.

Same for when the user cats the "trace" file. If the vma's have already
been freed, when this happens, how do we map these hashes from the vma? Do
we need to have trace events in the unmap to trigger them? If tracing is
not recording anymore, those events will be dropped too. Also, we only want
to record the vmas that are in the stack traces. Not just any vma.

> 
> > When I get a user space stack trace, I get the virtual addresses of each of
> > the user space functions. This is saved into an user stack trace event in
> > the ring buffer that usually gets mapped right to a file for post
> > processing.
> >
> > I still do the:
> >
> >  user_stack_trace() {
> >    foreach addr each stack frame
> >       vma = vma_lookup(mm, addr);
> >       callchain[i++] = (addr - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT);
> >
> > Are you saying that this shouldn't be done either?  
> 
> I'm saying that's ALL that should be done. And then that *ONE* single thing:
> 
>      callchain_filehash[i++] = hash(vma->vm_file);
> 
> BUT NOTHING ELSE.
> 
> None of that trace_file_map() garbage.
> 
> None of that add_into_hash() garbage.
> 
> NOTHING like that. You don't look at the hash. You don't "register"
> it. You don't touch it in any way. You literally just use it as a
> value, and user space will figure it out later. At event parsing time.

I guess this is where I'm stuck. How does user space know what those hash
values mean? Where does it get the information from?

> 
> At most, you could have some trivial optimization to avoid hashing the
> same pointer twice, ie have some single-entry cache of "it's still the
> same file pointer, I'll just use the same hash I calculated last
> time".
> 
> And I mean *single*-level, because siphash is fast enough that doing
> anything *more* than that is going to be slower than just
> re-calculating the hash.
> 
> In fact, you should probably do that optimization at the whole
> vma_lookup() level, and try to not look up the same vma multiple times
> when a common situation is probably that you'll have multiple stack
> frames all with entries pointing to the same executable (or library)
> mapping. Because "vma_lookup()" is likely about as expensive as the
> hashing is.

Yeah, we could add an optimization to store vma's in the callchain walk to
see if the next call chain belongs to a previous one. Could even just cache
the previous vma, as it's not as common to have one library calling into
another and back again.

That is, this would likely be useful:

  vma = NULL;
  foreach addr in callchain
    if (!vma || addr not in range of vma)
      vma = vma_lookup(addr);

-- Steve