[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f1182a6-fac7-41b0-b6db-24ff64afa8b2@linaro.org>
Date: Tue, 13 Jan 2026 12:03:51 +0000
From: James Clark <james.clark@...aro.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Tony Jones <tonyj@...e.de>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
Howard Chu <howardchu95@...il.com>,
Stephen Brennan <stephen.s.brennan@...cle.com>,
linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org
Subject: Re: [PATCH v3 0/7] perf: Add a libdw addr2line implementation
On 12/01/2026 6:29 pm, Ian Rogers wrote:
> On Mon, Jan 12, 2026 at 6:49 AM Ian Rogers <irogers@...gle.com> wrote:
>>
>> On Mon, Jan 12, 2026 at 3:18 AM James Clark <james.clark@...aro.org> wrote:
>>>
>>> On 11/01/2026 4:13 am, Ian Rogers wrote:
>>>> addr2line is a performance bottleneck in perf, add a libdw based
>>>> implementation that avoids forking addr2line and caches the decoded
>>>> debug information.
>>>>
>>>> Allow the addr2line implementation to be picked via the configuration
>>>> file or --addr2line-style with `perf report`.
>>>>
>>>> Test/fix that inline callchains are properly displayed by perf script.
>>>>
>>>> An example:
>>>> ```
>>>> $ perf record --call-graph dwarf -e cycles:u -- perf test -w inlineloop 1
>>>> [ perf record: Woken up 132 times to write data ]
>>>> [ perf record: Captured and wrote 32.814 MB perf.data (4074 samples) ]
>>>> $ perf script --fields +srcline
>>>> ...
>>>> perf-inlineloop 1814670 293100.228871: 640004 cpu_core/cycles/u:
>>>> 55a11d6e61ee leaf+0x2e
>>>> inlineloop.c:21 (inlined)
>>>> 55a11d6e61ee middle+0x2e
>>>> inlineloop.c:27 (inlined)
>>>> 55a11d6e61ee parent+0x2e (perf)
>>>> inlineloop.c:32
>>>> 55a11d6e629b inlineloop+0x8b (perf)
>>>> inlineloop.c:47
>>>> 55a11d69a3bc run_workload+0x5a (perf)
>>>> builtin-test.c:715
>>>> 55a11d69aa9f cmd_test+0x417 (perf)
>>>> builtin-test.c:825
>>>> 55a11d6155f5 run_builtin+0xd4 (perf)
>>>> perf.c:349
>>>> 55a11d61588d handle_internal_command+0xdd (perf)
>>>> perf.c:401
>>>> 55a11d6159e6 run_argv+0x35 (perf)
>>>> perf.c:445
>>>> 55a11d615d2f main+0x2cb (perf)
>>>> perf.c:553
>>>> 7fae3d233ca7 __libc_start_call_main+0x77 (libc.so.6)
>>>> libc_start_call_main.h:58
>>>> 7fae3d233d64 __libc_start_main_impl+0x84
>>>> libc-start.c:360 (inlined)
>>>> 55a11d565f80 _start+0x20 (perf)
>>>> ??:0
>>>> ...
>>>> ```
>>>>
>>>> v3: Make the caller inline file and line number accurate in the libdw
>>>> addr2line, rather than using the function's declared location.
>>>> Fix reference counts in unwind-libdw. Add fixes tag for srcline
>>>> inline printing.
>>>>
>>>> v2: Fix bias issue with libdwfl functions. Use cu_walk_functions_at
>>>> from perf's dwarf-aux to fully walk inline functions. Add testing
>>>> that inlined functions are shown in the perf script srcline
>>>> callchain information. Add configurability as to which addr2line
>>>> style to use.
>>>> https://lore.kernel.org/lkml/20260110082647.1487574-1-irogers@google.com/
>>>>
>>>> v1: https://lore.kernel.org/lkml/20251122093934.94971-1-irogers@google.com/
>>>>
>>>> Ian Rogers (7):
>>>> perf unwind-libdw: Fix invalid reference counts
>>>> perf addr2line: Add a libdw implementation
>>>> perf addr2line.c: Rename a2l_style to cmd_a2l_style
>>>> perf srcline: Add configuration support for the addr2line style
>>>> perf callchain: Fix srcline printing with inlines
>>>> perf test workload: Add inlineloop test workload
>>>> perf test: Test addr2line unwinding works with inline functions
>>>>
>>>> tools/perf/builtin-report.c | 10 ++
>>>> tools/perf/tests/builtin-test.c | 1 +
>>>> tools/perf/tests/shell/addr2line_inlines.sh | 47 ++++++
>>>> tools/perf/tests/tests.h | 1 +
>>>> tools/perf/tests/workloads/Build | 2 +
>>>> tools/perf/tests/workloads/inlineloop.c | 52 +++++++
>>>> tools/perf/util/Build | 1 +
>>>> tools/perf/util/addr2line.c | 20 +--
>>>> tools/perf/util/config.c | 4 +
>>>> tools/perf/util/dso.c | 2 +
>>>> tools/perf/util/dso.h | 11 ++
>>>> tools/perf/util/evsel_fprintf.c | 8 +-
>>>> tools/perf/util/libdw.c | 153 ++++++++++++++++++++
>>>> tools/perf/util/libdw.h | 60 ++++++++
>>>> tools/perf/util/srcline.c | 116 ++++++++++++++-
>>>> tools/perf/util/srcline.h | 3 +
>>>> tools/perf/util/symbol_conf.h | 10 ++
>>>> tools/perf/util/unwind-libdw.c | 7 +-
>>>> 18 files changed, 486 insertions(+), 22 deletions(-)
>>>> create mode 100755 tools/perf/tests/shell/addr2line_inlines.sh
>>>> create mode 100644 tools/perf/tests/workloads/inlineloop.c
>>>> create mode 100644 tools/perf/util/libdw.c
>>>> create mode 100644 tools/perf/util/libdw.h
>>>>
>>>
>>> I don't see the differences to the other addr2line implementations
>>> anymore, but only because it falls through to the old ones when libdw
>>> fails now.
>>>
>>> For example when building Perf with LLVM it can't get the line in the
>>> inlineloop workload, and there's still a few things in libc and other
>>> system libraries it fails on.
>>
>> Hmm.. I wonder what the issue is. I was looking at the dwarf output
>> from my gcc builds with llvm-dwarfdump. I wonder if LLVM builds are
I see some issues in libc on Ubuntu though, which I assume is compiled
with GCC, although there's no .comment section in it so I can't be sure.
So it's not exclusively LLVM but it does seem like LLVM builds cause a
lot more failures.
>> doing something to confuse libdw? I'll try to investigate. There are
>> quite a few levels of libdw: there's the raw libdw, libdwfl (frontend
>> to libdw) that does the parsing and tries to give things like nested
>> debug scopes (libdwfl is the one needing addresses with a module bias
>> rather than raw file offsets), and then there is the dwarf-aux.c that
>> is in perf and is used by things like probe finding (I believe this
>> doesn't need biases addresses). Anyway, with the biases there are
>> things I can screw up (like in the v1 patch) but maybe the LLVM issue
>> is just a libdw and dwarf-5 kind of thing. Maybe it is ARM specific
>> :-/
Actually I get the same behavior on Arm and x86.
>
> Testing with clang/llvm on x86-64 (dwarf5):
> ```
> $ make -C tools/perf O=/tmp/perf DEBUG=1 CC=clang CXX=clang++
> HOSTCC=clang clean all
> ...
> $ llvm-dwarfdump /tmp/perf/perf
> ...
> 0x0014f852: Compile Unit: length = 0x00000294, format = DWARF32,
> version = 0x0005, unit_type = DW_UT_compile,
> abbr_offset = 0x1879a, addr_size = 0x08 (next unit at 0x0014faea)
>
> 0x0014f85e: DW_TAG_compile_unit
> DW_AT_producer ("Debian clang version 19.1.7 (3+build5)")
> DW_AT_language (DW_LANG_C11)
> DW_AT_name ("tests/workloads/inlineloop.c")
> DW_AT_str_offsets_base (0x0004a550)
> DW_AT_stmt_list (0x0008c3f2)
> DW_AT_comp_dir ("linux/tools/perf")
> DW_AT_low_pc (0x00000000001e61c0)
> DW_AT_high_pc (0x00000000001e62e9)
> DW_AT_addr_base (0x00022248)
> DW_AT_loclists_base (0x0000018a)
> ...
> $ sudo /tmp/perf/perf record --call-graph dwarf -e cycles:u --
> /tmp/perf/perf test -w inlineloop 1
> ...
> $ sudo /tmp/perf/perf script --fields +srcline
> ...
> perf-inlineloop 2284167 423038.015394: 569917 cpu_core/cycles/u:
> 56390020d2c6 leaf+0x26
> inlineloop.c:21 (inlined)
> 56390020d2c6 middle+0x26
> inlineloop.c:27 (inlined)
> 56390020d2c6 parent+0x26 (/tmp/perf/perf)
> ...
> ```
> I ran inside of gdb and confirmed that the libdw code is creating the
> inlined information (breakpoint on libdw_a2l_cb, etc.). So I'm not
> able to reproduce the LLVM issue for now on x86-64.
>
> Thanks,
> Ian
>
If I set this in ~/.perfconfig so the fallback is disabled:
[addr2line]
style = libdw
Then:
$ make LLVM=1 -C tools/perf DEBUG=1 clean all
$ perf record --delay 1000 -- perf test -w inlineloop 2
$ perf script --fields ip,srcline
6012b5957b70
perf[1f7b70]
6012b5957b70
perf[1f7b70]
...
x86:
$ clang -v
Ubuntu clang version 15.0.7
Arm:
$ clang -v
Ubuntu clang version 18.1.8 (11~20.04.2)
Disabling the ~/.perfconfig to re-enable the LLVM fallback works:
(x86)
$ perf script --fields ip,srcline
6012b5957b70
inlineloop.c:20
6012b5957b70
inlineloop.c:20
Interestingly, on Arm this results in zeros for line numbers. This is a
completely different issue though which I didn't notice before because I
built with GCC. It falls all the way back to A2L_STYLE_CMD:
(Arm)
$ perf script --fields ip,srcline
aaaad0a7828c
inlineloop.c:0
aaaad0a7828c
inlineloop.c:0
$ addr2line -e `which perf` -a -i -f aaaad0a7828c
0x0000aaaad0a7828c
??
??:0
Probably shouldn't get sidetracked by that here though. It's at least
working when compiled with GCC, and neither LLVM or libdw work, so it's
no worse.
>>> But I think it's fine because it doesn't give the wrong line anymore, it
>>> just falls through to another working addr2line implementation.
>>
>> Just to confirm that with gcc builds it isn't failing now? ie it isn't
>> just an addr2line implementation that falls through all the time? I
>> was seeing things working/testing on x86 with gcc.
>>
No, the GCC Perf build always works with libdw as far as I can see. Just
the occasional fall through to LLVM with some libc addresses.
>>> Reviewed-by: James Clark <james.clark@...aro.org>
>>
>> Thanks,
>> Ian
Powered by blists - more mailing lists