linux-kernel - Re: [PATCH 1/2] perf jitdump: Add load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fV0=s6B=-m=iAwOzQrntG9vXn+jWeWhdQewGJhQBnqG6w@mail.gmail.com>
Date: Fri, 14 Nov 2025 15:58:15 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: maskray@...rceware.org, Arnaldo Carvalho de Melo <acme@...nel.org>, 
	James Clark <james.clark@...aro.org>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>, 
	linux-perf-users@...r.kernel.org, Eric Biggers <ebiggers@...nel.org>, 
	Pablo Galindo <pablogsal@...il.com>
Subject: Re: [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation

On Fri, Nov 14, 2025 at 3:24 PM Namhyung Kim <namhyung@...nel.org> wrote:
>
> On Fri, Nov 14, 2025 at 11:32:52AM -0800, Ian Rogers wrote:
> > On Fri, Nov 14, 2025 at 10:57 AM Namhyung Kim <namhyung@...nel.org> wrote:
> > >
> > > On Fri, Nov 14, 2025 at 09:33:29AM -0800, Ian Rogers wrote:
> > > > On Fri, Nov 14, 2025 at 1:29 AM Namhyung Kim <namhyung@...nel.org> wrote:
> > > > >
> > > > > It was reported that python backtrace with JIT dump was broken after the
> > > > > change to built-in SHA-1 implementation.  It seems python generates the
> > > > > same JIT code for each function.  They will become separate DSOs but the
> > > > > contents are the same.  Only difference is in the symbol name.
> > > > >
> > > > > But this caused a problem that every JIT'ed DSOs will have the same
> > > > > build-ID which makes perf confused.  And it resulted in no python
> > > > > symbols (from JIT) in the output.
> > > >
> > > > The lookup of a DSO involves the build ID and the filename. I'm
> > > > confused as to why things weren't deduplicated and why no symbols
> > > > rather than repeatedly the same symbol?
> > >
> > > I don't know, but that's the symptom in the original bug report in the
> > > python github (see Links: below).  I guess the behavior is
> > > non-deterministic.
> > >
> > > >
> > > > > Looking back at the original code before the conversion, it used the
> > > > > load_addr as well as the code section to distinguish each DSO.  I think
> > > > > we should do the same or use symbol table as an additional input for
> > > > > SHA-1.
> > > >
> > > > Hmm.. the build ID for the contents of the code should be a constant.
> > > > As the build ID is a note for the entire ELF file then something is
> > > > wrong with the filename handling it seems.
> > >
> > > When it tries to load symbols from a DSO, it prefer reading from the
> > > build-ID cache than the file system since it trusts build-IDs more than
> > > the path name.  See dso__load() and binary_type_symtab[].
> > >
> > > So having multiple DSO's with the same build-ID can be a problem if they
> > > are in the build-ID cache.  Normally `perf inject -j` won't add the new
> > > JIT-ed DSOs to the build-ID cache but it's still possible.
> >
> > +Fangrui
> >
> > I'm surprised that build IDs don't include symbol names but:
> > ```
> > $ cat a.s
> > .text
> > .global main
> > .global foo
> > main:
> > foo:
> >        ret
> > $ cat b.s
> > .text
> > .global main
> > .global bar
> > main:
> > bar:
> >        ret
> > $ gcc -Wl,--build-id a.s -o a.out
> > $ gcc -Wl,--build-id b.s -o b.out
> > $ readelf -n a.out
> > ...
> >    Build ID: 9dd0371b953db5d72929af5d98552e4ee1043616
> > ...
> > $ readelf -n b.out
> > ...
> >    Build ID: 9dd0371b953db5d72929af5d98552e4ee1043616
> > ...
> > ```
> > so ugh. Perhaps we need to have jitdump make a single object file (and
> > so 1 build ID) but with multiple unique symbols.
>
> Right, that'd be better.  But I'm afraid some JIT code could spread to
> many segments so it's not possible to create a map to cover all areas
> due to conflicts with other libraries.

I'm not familiar with any JITs doing this. JITs often have to run on
Windows where there is a reserve for the code cache and then commits
to actually use individual pages. On Linux you can do a large
PROT_NONE mmap and then mprotect pages within this code cache region
(we should have perf events on this). Apple has some optimizations
with the MAP_JIT mmap flag. SBCL has a code cache within the
executable itself. On x86 you could spread code around within +/-2GB,
on RISC it'd be a pain due to offset limitations.

Anyway, it sounds like something is off when we are writing the
executable and likely having too many of them. There is also something
off with dsos__find as it should only give a dso if the name and build
ID are matching. The sample should have an IP we turn into a map
within maps, the dso of the map should differ for each of the call
chain IPs (even though they are < page_size()) and the symbol lookup
should work.

If we're not going to maintain that jitdump build IDs behave like
build IDs then there is little point using a SHA-1 hash and patch
series like:
https://lore.kernel.org/lkml/20251016205126.2882625-1-irogers@google.com/
are even more tedious to land :-)

Thanks,
Ian