[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLC2Vs06UifGU2HZ@x1>
Date: Thu, 28 Aug 2025 17:04:38 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@...il.com>,
Steven Rostedt <rostedt@...nel.org>, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, bpf@...r.kernel.org,
x86@...nel.org, Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andrii Nakryiko <andrii@...nel.org>,
Indu Bhagat <indu.bhagat@...cle.com>,
"Jose E. Marchesi" <jemarch@....org>,
Beau Belgrave <beaub@...ux.microsoft.com>,
Jens Remus <jremus@...ux.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Florian Weimer <fweimer@...hat.com>, Sam James <sam@...too.org>,
Kees Cook <kees@...nel.org>, Carlos O'Donell <codonell@...hat.com>
Subject: Re: [PATCH v6 5/6] tracing: Show inode and device major:minor in
deferred user space stacktrace
On Thu, Aug 28, 2025 at 12:18:39PM -0700, Linus Torvalds wrote:
> On Thu, 28 Aug 2025 at 11:58, Arnaldo Carvalho de Melo <arnaldo.melo@...il.com> wrote:
> > >Give the damn thing an actual filename or something *useful*, not a
> > >number that user space can't even necessarily match up to anything.
> > A build ID?
> I think that's a better thing than the disgusting inode number, yes.
> That said, I think they are problematic too, in that I don't think
> they are universally available, so if you want to trace some
> executable without build ids - and there are good reasons to do that -
> you might hate being limited that way.
Right, but these days gdb (and other traditional tools) supports it and
downloads it (perf should do it with a one-time sticky question too,
does it already in some cases, unconditionally, that should be fixed as
well), most distros have it:
⬢ [acme@...lbx perf-tools-next]$ file /bin/bash
/bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=707a1c670cd72f8e55ffedfbe94ea98901b7ce3a, for GNU/Linux 3.2.0, stripped
⬢ [acme@...lbx perf-tools-next]$
We have debuginfod-servers that brings ELF images with debug keyed by
that build id and finally build-ids come together with pathnames, so if
one is null, fallback to the other.
Default in fedora:
⬢ [acme@...lbx perf-tools-next]$ echo $DEBUGINFOD_
$DEBUGINFOD_IMA_CERT_PATH $DEBUGINFOD_URLS
⬢ [acme@...lbx perf-tools-next]$ echo $DEBUGINFOD_
$DEBUGINFOD_IMA_CERT_PATH $DEBUGINFOD_URLS
⬢ [acme@...lbx perf-tools-next]$ echo $DEBUGINFOD_IMA_CERT_PATH
/etc/keys/ima:
⬢ [acme@...lbx perf-tools-next]$ echo $DEBUGINFOD_URLS
https://debuginfod.fedoraproject.org/
⬢ [acme@...lbx perf-tools-next]$
I wasn't aware of that IMA stuff.
So even without the mandate and with sometimes not being able to get
that build-id, most of the time they are there and deterministically
allows tooling to fetch it in most cases, I guess that is as far as we
can pragmatically get.
- Arnaldo
> So I think you'd be much better off with just actual pathnames.
>
> Are there no trace events for "mmap this path"? Create a good u64 hash
> from the contents of a 'struct path' (which is just two pointers: the
> dentry and the mnt) when mmap'ing the file, and then you can just
> associate the stack trace entry with that hash.
>
> That should be simple and straightforward, and hashing two pointers
> should be simple and straightforward.
>
> And then matching that hash against the mmap event where the actual
> path was saved off gives you an actual *pathname*. Which is *so* much
> better than those horrific inode numbers.
>
> And yes, yes, obviously filenames can go away and aren't some kind of
> long-term stable thing. But inode numbers can be re-used too, so
> that's no different.
>
> With the "create a hash of 'struct path' contents" you basically have
> an ID that can be associated with whatever the file name was at the
> time it was mmap'ed into the thing you are tracing, which is I think
> what you really want anyway.
>
> Now, what would be even simpler is to not create a hash at all, but
> simply just create the whole pathname when the stack trace entry is
> created. But it would probably waste too much space, since you'd
> probably want to have at least 32 bytes (as opposed to just 64 bits)
> for a (truncated) pathname.
>
> And it would be more expensive than just hashing the dentry/mnt
> pointers, although '%pD' isn't actually *that* expensive. But probably
> expensive enough to not really be acceptable. I'm just throwing it out
> as a stupid idea that at least generates much more usable output than
> the inode numbers do.
>
> Linus
Powered by blists - more mailing lists