[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fVm8NfxyQcEEpKqLeXEGhzzruNFY=sBXGFzyiftQCPWPA@mail.gmail.com>
Date: Mon, 12 Jan 2026 10:29:01 -0800
From: Ian Rogers <irogers@...gle.com>
To: James Clark <james.clark@...aro.org>
Cc: Tony Jones <tonyj@...e.de>, Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, Howard Chu <howardchu95@...il.com>,
Stephen Brennan <stephen.s.brennan@...cle.com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org
Subject: Re: [PATCH v3 0/7] perf: Add a libdw addr2line implementation
On Mon, Jan 12, 2026 at 6:49 AM Ian Rogers <irogers@...gle.com> wrote:
>
> On Mon, Jan 12, 2026 at 3:18 AM James Clark <james.clark@...aro.org> wrote:
> >
> > On 11/01/2026 4:13 am, Ian Rogers wrote:
> > > addr2line is a performance bottleneck in perf, add a libdw based
> > > implementation that avoids forking addr2line and caches the decoded
> > > debug information.
> > >
> > > Allow the addr2line implementation to be picked via the configuration
> > > file or --addr2line-style with `perf report`.
> > >
> > > Test/fix that inline callchains are properly displayed by perf script.
> > >
> > > An example:
> > > ```
> > > $ perf record --call-graph dwarf -e cycles:u -- perf test -w inlineloop 1
> > > [ perf record: Woken up 132 times to write data ]
> > > [ perf record: Captured and wrote 32.814 MB perf.data (4074 samples) ]
> > > $ perf script --fields +srcline
> > > ...
> > > perf-inlineloop 1814670 293100.228871: 640004 cpu_core/cycles/u:
> > > 55a11d6e61ee leaf+0x2e
> > > inlineloop.c:21 (inlined)
> > > 55a11d6e61ee middle+0x2e
> > > inlineloop.c:27 (inlined)
> > > 55a11d6e61ee parent+0x2e (perf)
> > > inlineloop.c:32
> > > 55a11d6e629b inlineloop+0x8b (perf)
> > > inlineloop.c:47
> > > 55a11d69a3bc run_workload+0x5a (perf)
> > > builtin-test.c:715
> > > 55a11d69aa9f cmd_test+0x417 (perf)
> > > builtin-test.c:825
> > > 55a11d6155f5 run_builtin+0xd4 (perf)
> > > perf.c:349
> > > 55a11d61588d handle_internal_command+0xdd (perf)
> > > perf.c:401
> > > 55a11d6159e6 run_argv+0x35 (perf)
> > > perf.c:445
> > > 55a11d615d2f main+0x2cb (perf)
> > > perf.c:553
> > > 7fae3d233ca7 __libc_start_call_main+0x77 (libc.so.6)
> > > libc_start_call_main.h:58
> > > 7fae3d233d64 __libc_start_main_impl+0x84
> > > libc-start.c:360 (inlined)
> > > 55a11d565f80 _start+0x20 (perf)
> > > ??:0
> > > ...
> > > ```
> > >
> > > v3: Make the caller inline file and line number accurate in the libdw
> > > addr2line, rather than using the function's declared location.
> > > Fix reference counts in unwind-libdw. Add fixes tag for srcline
> > > inline printing.
> > >
> > > v2: Fix bias issue with libdwfl functions. Use cu_walk_functions_at
> > > from perf's dwarf-aux to fully walk inline functions. Add testing
> > > that inlined functions are shown in the perf script srcline
> > > callchain information. Add configurability as to which addr2line
> > > style to use.
> > > https://lore.kernel.org/lkml/20260110082647.1487574-1-irogers@google.com/
> > >
> > > v1: https://lore.kernel.org/lkml/20251122093934.94971-1-irogers@google.com/
> > >
> > > Ian Rogers (7):
> > > perf unwind-libdw: Fix invalid reference counts
> > > perf addr2line: Add a libdw implementation
> > > perf addr2line.c: Rename a2l_style to cmd_a2l_style
> > > perf srcline: Add configuration support for the addr2line style
> > > perf callchain: Fix srcline printing with inlines
> > > perf test workload: Add inlineloop test workload
> > > perf test: Test addr2line unwinding works with inline functions
> > >
> > > tools/perf/builtin-report.c | 10 ++
> > > tools/perf/tests/builtin-test.c | 1 +
> > > tools/perf/tests/shell/addr2line_inlines.sh | 47 ++++++
> > > tools/perf/tests/tests.h | 1 +
> > > tools/perf/tests/workloads/Build | 2 +
> > > tools/perf/tests/workloads/inlineloop.c | 52 +++++++
> > > tools/perf/util/Build | 1 +
> > > tools/perf/util/addr2line.c | 20 +--
> > > tools/perf/util/config.c | 4 +
> > > tools/perf/util/dso.c | 2 +
> > > tools/perf/util/dso.h | 11 ++
> > > tools/perf/util/evsel_fprintf.c | 8 +-
> > > tools/perf/util/libdw.c | 153 ++++++++++++++++++++
> > > tools/perf/util/libdw.h | 60 ++++++++
> > > tools/perf/util/srcline.c | 116 ++++++++++++++-
> > > tools/perf/util/srcline.h | 3 +
> > > tools/perf/util/symbol_conf.h | 10 ++
> > > tools/perf/util/unwind-libdw.c | 7 +-
> > > 18 files changed, 486 insertions(+), 22 deletions(-)
> > > create mode 100755 tools/perf/tests/shell/addr2line_inlines.sh
> > > create mode 100644 tools/perf/tests/workloads/inlineloop.c
> > > create mode 100644 tools/perf/util/libdw.c
> > > create mode 100644 tools/perf/util/libdw.h
> > >
> >
> > I don't see the differences to the other addr2line implementations
> > anymore, but only because it falls through to the old ones when libdw
> > fails now.
> >
> > For example when building Perf with LLVM it can't get the line in the
> > inlineloop workload, and there's still a few things in libc and other
> > system libraries it fails on.
>
> Hmm.. I wonder what the issue is. I was looking at the dwarf output
> from my gcc builds with llvm-dwarfdump. I wonder if LLVM builds are
> doing something to confuse libdw? I'll try to investigate. There are
> quite a few levels of libdw: there's the raw libdw, libdwfl (frontend
> to libdw) that does the parsing and tries to give things like nested
> debug scopes (libdwfl is the one needing addresses with a module bias
> rather than raw file offsets), and then there is the dwarf-aux.c that
> is in perf and is used by things like probe finding (I believe this
> doesn't need biases addresses). Anyway, with the biases there are
> things I can screw up (like in the v1 patch) but maybe the LLVM issue
> is just a libdw and dwarf-5 kind of thing. Maybe it is ARM specific
> :-/
Testing with clang/llvm on x86-64 (dwarf5):
```
$ make -C tools/perf O=/tmp/perf DEBUG=1 CC=clang CXX=clang++
HOSTCC=clang clean all
...
$ llvm-dwarfdump /tmp/perf/perf
...
0x0014f852: Compile Unit: length = 0x00000294, format = DWARF32,
version = 0x0005, unit_type = DW_UT_compile,
abbr_offset = 0x1879a, addr_size = 0x08 (next unit at 0x0014faea)
0x0014f85e: DW_TAG_compile_unit
DW_AT_producer ("Debian clang version 19.1.7 (3+build5)")
DW_AT_language (DW_LANG_C11)
DW_AT_name ("tests/workloads/inlineloop.c")
DW_AT_str_offsets_base (0x0004a550)
DW_AT_stmt_list (0x0008c3f2)
DW_AT_comp_dir ("linux/tools/perf")
DW_AT_low_pc (0x00000000001e61c0)
DW_AT_high_pc (0x00000000001e62e9)
DW_AT_addr_base (0x00022248)
DW_AT_loclists_base (0x0000018a)
...
$ sudo /tmp/perf/perf record --call-graph dwarf -e cycles:u --
/tmp/perf/perf test -w inlineloop 1
...
$ sudo /tmp/perf/perf script --fields +srcline
...
perf-inlineloop 2284167 423038.015394: 569917 cpu_core/cycles/u:
56390020d2c6 leaf+0x26
inlineloop.c:21 (inlined)
56390020d2c6 middle+0x26
inlineloop.c:27 (inlined)
56390020d2c6 parent+0x26 (/tmp/perf/perf)
...
```
I ran inside of gdb and confirmed that the libdw code is creating the
inlined information (breakpoint on libdw_a2l_cb, etc.). So I'm not
able to reproduce the LLVM issue for now on x86-64.
Thanks,
Ian
> > But I think it's fine because it doesn't give the wrong line anymore, it
> > just falls through to another working addr2line implementation.
>
> Just to confirm that with gcc builds it isn't failing now? ie it isn't
> just an addr2line implementation that falls through all the time? I
> was seeing things working/testing on x86 with gcc.
>
> > Reviewed-by: James Clark <james.clark@...aro.org>
>
> Thanks,
> Ian
Powered by blists - more mailing lists