[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251125192943.GA3061247@google.com>
Date: Tue, 25 Nov 2025 19:29:43 +0000
From: Eric Biggers <ebiggers@...nel.org>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Ian Rogers <irogers@...gle.com>,
James Clark <james.clark@...aro.org>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
linux-perf-users@...r.kernel.org,
Pablo Galindo <pablogsal@...il.com>,
Fangrui Song <maskray@...rceware.org>
Subject: Re: [PATCH v2 1/2] perf jitdump: Add sym/str-tables to build-ID
generation
On Tue, Nov 25, 2025 at 12:07:46AM -0800, Namhyung Kim wrote:
> It was reported that python backtrace with JIT dump was broken after the
> change to built-in SHA-1 implementation. It seems python generates the
> same JIT code for each function. They will become separate DSOs but the
> contents are the same. Only difference is in the symbol name.
>
> But this caused a problem that every JIT'ed DSOs will have the same
> build-ID which makes perf confused. And it resulted in no python
> symbols (from JIT) in the output.
>
> Looking back at the original code before the conversion, it used the
> load_addr as well as the code section to distinguish each DSO. But it'd
> be better to use contents of symtab and strtab instead as it aligns with
> some linker behaviors.
>
> This patch adds a buffer to save all the contents in a single place for
> SHA-1 calculation. Probably we need to add sha1_update() or similar to
> update the existing hash value with different contents and use it here.
> But it's out of scope for this change and I'd like something that can be
> backported to the stable trees easily.
>
> Fixes: e3f612c1d8f3945b ("perf genelf: Remove libcrypto dependency and use built-in sha1()")
> Cc: Eric Biggers <ebiggers@...nel.org>
> Cc: Pablo Galindo <pablogsal@...il.com>
> Cc: Fangrui Song <maskray@...rceware.org>
> Link: https://github.com/python/cpython/issues/139544
> Signed-off-by: Namhyung Kim <namhyung@...nel.org>
That commit actually preserved the behavior of the existing variant of
gen_build_id() that was under #ifdef BUILD_ID_SHA. So I guess that code
was always broken, and it was just never noticed because the alternative
variant of gen_build_id() under #ifdef BUILD_ID_MD5 was used instead?
The MD5 variant of gen_build_id() just hashed the load_addr concatenated
with the code. That's not what this patch does, though. So just to
clarify, you'd actually like to go with a third approach rather than
just restoring the original hash(load_addr || code) approach?
Also, I missed that you had actually changed the hash algorithm. I had
assumed the perf folks were were pushing SHA-1 because they were already
using it. Given that the algorithm changed, there must not be any
backwards compatibility concerns here, and you should switch to a modern
hash algorithm such as SHA-256 instead.
I'd be glad to add an incremental API if you need it, but I'm confused
why you want SHA-1 and not a modern hash algorithm.
- Eric
Powered by blists - more mailing lists