[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWsFLkYGFFyzSrpZNdRLpjaTZAuW7-YVUr-zMVH5dk8eg@mail.gmail.com>
Date: Fri, 14 Nov 2025 09:33:29 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, James Clark <james.clark@...aro.org>,
Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, linux-perf-users@...r.kernel.org,
Eric Biggers <ebiggers@...nel.org>, Pablo Galindo <pablogsal@...il.com>
Subject: Re: [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation
On Fri, Nov 14, 2025 at 1:29 AM Namhyung Kim <namhyung@...nel.org> wrote:
>
> It was reported that python backtrace with JIT dump was broken after the
> change to built-in SHA-1 implementation. It seems python generates the
> same JIT code for each function. They will become separate DSOs but the
> contents are the same. Only difference is in the symbol name.
>
> But this caused a problem that every JIT'ed DSOs will have the same
> build-ID which makes perf confused. And it resulted in no python
> symbols (from JIT) in the output.
The lookup of a DSO involves the build ID and the filename. I'm
confused as to why things weren't deduplicated and why no symbols
rather than repeatedly the same symbol?
> Looking back at the original code before the conversion, it used the
> load_addr as well as the code section to distinguish each DSO. I think
> we should do the same or use symbol table as an additional input for
> SHA-1.
Hmm.. the build ID for the contents of the code should be a constant.
As the build ID is a note for the entire ELF file then something is
wrong with the filename handling it seems.
Thanks,
Ian
> This patch is a quick-and-dirty fix just to add each byte of the
> load_addr to the first 8 bytes of SHA-1 result. Probably we need to add
> sha1_update() or similar to update the existing hash value and use it
> here. I'd like something that can be backported to the stable trees
> easily.
>
> Fixes: e3f612c1d8f3945b ("perf genelf: Remove libcrypto dependency and use built-in sha1()")
> Cc: Eric Biggers <ebiggers@...nel.org>
> Cc: Pablo Galindo <pablogsal@...il.com>
> Link: https://github.com/python/cpython/issues/139544
> Signed-off-by: Namhyung Kim <namhyung@...nel.org>
> ---
> tools/perf/util/genelf.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/tools/perf/util/genelf.c b/tools/perf/util/genelf.c
> index 591548b10e34ef6a..a412e6faf70e37f3 100644
> --- a/tools/perf/util/genelf.c
> +++ b/tools/perf/util/genelf.c
> @@ -395,6 +395,15 @@ jit_write_elf(int fd, uint64_t load_addr __maybe_unused, const char *sym,
> * build-id generation
> */
> sha1(code, csize, bnote.build_id);
> + /* FIXME: update the SHA-1 hash using additional contents */
> + bnote.build_id[0] += (load_addr >> 0) & 0xff;
> + bnote.build_id[1] += (load_addr >> 8) & 0xff;
> + bnote.build_id[2] += (load_addr >> 16) & 0xff;
> + bnote.build_id[3] += (load_addr >> 24) & 0xff;
> + bnote.build_id[4] += (load_addr >> 32) & 0xff;
> + bnote.build_id[5] += (load_addr >> 40) & 0xff;
> + bnote.build_id[6] += (load_addr >> 48) & 0xff;
> + bnote.build_id[7] += (load_addr >> 56) & 0xff;
> bnote.desc.namesz = sizeof(bnote.name); /* must include 0 termination */
> bnote.desc.descsz = sizeof(bnote.build_id);
> bnote.desc.type = NT_GNU_BUILD_ID;
> --
> 2.52.0.rc1.455.g30608eb744-goog
>
Powered by blists - more mailing lists