lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUC88TVwnjTuTgU+s=LqqP0xOhNTG5hU27cSbaZRH7Jpg@mail.gmail.com>
Date: Fri, 14 Nov 2025 11:32:52 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>, maskray@...rceware.org
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, James Clark <james.clark@...aro.org>, 
	Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>, 
	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, 
	LKML <linux-kernel@...r.kernel.org>, linux-perf-users@...r.kernel.org, 
	Eric Biggers <ebiggers@...nel.org>, Pablo Galindo <pablogsal@...il.com>
Subject: Re: [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation

On Fri, Nov 14, 2025 at 10:57 AM Namhyung Kim <namhyung@...nel.org> wrote:
>
> On Fri, Nov 14, 2025 at 09:33:29AM -0800, Ian Rogers wrote:
> > On Fri, Nov 14, 2025 at 1:29 AM Namhyung Kim <namhyung@...nel.org> wrote:
> > >
> > > It was reported that python backtrace with JIT dump was broken after the
> > > change to built-in SHA-1 implementation.  It seems python generates the
> > > same JIT code for each function.  They will become separate DSOs but the
> > > contents are the same.  Only difference is in the symbol name.
> > >
> > > But this caused a problem that every JIT'ed DSOs will have the same
> > > build-ID which makes perf confused.  And it resulted in no python
> > > symbols (from JIT) in the output.
> >
> > The lookup of a DSO involves the build ID and the filename. I'm
> > confused as to why things weren't deduplicated and why no symbols
> > rather than repeatedly the same symbol?
>
> I don't know, but that's the symptom in the original bug report in the
> python github (see Links: below).  I guess the behavior is
> non-deterministic.
>
> >
> > > Looking back at the original code before the conversion, it used the
> > > load_addr as well as the code section to distinguish each DSO.  I think
> > > we should do the same or use symbol table as an additional input for
> > > SHA-1.
> >
> > Hmm.. the build ID for the contents of the code should be a constant.
> > As the build ID is a note for the entire ELF file then something is
> > wrong with the filename handling it seems.
>
> When it tries to load symbols from a DSO, it prefer reading from the
> build-ID cache than the file system since it trusts build-IDs more than
> the path name.  See dso__load() and binary_type_symtab[].
>
> So having multiple DSO's with the same build-ID can be a problem if they
> are in the build-ID cache.  Normally `perf inject -j` won't add the new
> JIT-ed DSOs to the build-ID cache but it's still possible.

+Fangrui

I'm surprised that build IDs don't include symbol names but:
```
$ cat a.s
.text
.global main
.global foo
main:
foo:
       ret
$ cat b.s
.text
.global main
.global bar
main:
bar:
       ret
$ gcc -Wl,--build-id a.s -o a.out
$ gcc -Wl,--build-id b.s -o b.out
$ readelf -n a.out
...
   Build ID: 9dd0371b953db5d72929af5d98552e4ee1043616
...
$ readelf -n b.out
...
   Build ID: 9dd0371b953db5d72929af5d98552e4ee1043616
...
```
so ugh. Perhaps we need to have jitdump make a single object file (and
so 1 build ID) but with multiple unique symbols.

Thanks,
Ian




> Thanks,
> Namhyung
>
> >
> > > This patch is a quick-and-dirty fix just to add each byte of the
> > > load_addr to the first 8 bytes of SHA-1 result.  Probably we need to add
> > > sha1_update() or similar to update the existing hash value and use it
> > > here.  I'd like something that can be backported to the stable trees
> > > easily.
> > >
> > > Fixes: e3f612c1d8f3945b ("perf genelf: Remove libcrypto dependency and use built-in sha1()")
> > > Cc: Eric Biggers <ebiggers@...nel.org>
> > > Cc: Pablo Galindo <pablogsal@...il.com>
> > > Link: https://github.com/python/cpython/issues/139544
> > > Signed-off-by: Namhyung Kim <namhyung@...nel.org>
> > > ---
> > >  tools/perf/util/genelf.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/tools/perf/util/genelf.c b/tools/perf/util/genelf.c
> > > index 591548b10e34ef6a..a412e6faf70e37f3 100644
> > > --- a/tools/perf/util/genelf.c
> > > +++ b/tools/perf/util/genelf.c
> > > @@ -395,6 +395,15 @@ jit_write_elf(int fd, uint64_t load_addr __maybe_unused, const char *sym,
> > >          * build-id generation
> > >          */
> > >         sha1(code, csize, bnote.build_id);
> > > +       /* FIXME: update the SHA-1 hash using additional contents */
> > > +       bnote.build_id[0] += (load_addr >> 0) & 0xff;
> > > +       bnote.build_id[1] += (load_addr >> 8) & 0xff;
> > > +       bnote.build_id[2] += (load_addr >> 16) & 0xff;
> > > +       bnote.build_id[3] += (load_addr >> 24) & 0xff;
> > > +       bnote.build_id[4] += (load_addr >> 32) & 0xff;
> > > +       bnote.build_id[5] += (load_addr >> 40) & 0xff;
> > > +       bnote.build_id[6] += (load_addr >> 48) & 0xff;
> > > +       bnote.build_id[7] += (load_addr >> 56) & 0xff;
> > >         bnote.desc.namesz = sizeof(bnote.name); /* must include 0 termination */
> > >         bnote.desc.descsz = sizeof(bnote.build_id);
> > >         bnote.desc.type   = NT_GNU_BUILD_ID;
> > > --
> > > 2.52.0.rc1.455.g30608eb744-goog
> > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ