[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fWbXEv+MLtsF30PTOGMz9Wcu7WG2TkF5x=so9DBm6eKvQ@mail.gmail.com>
Date: Wed, 11 Jun 2025 15:05:28 -0700
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>,
James Clark <james.clark@...aro.org>, Howard Chu <howardchu95@...il.com>,
Yicong Yang <yangyicong@...ilicon.com>, Michael Petlan <mpetlan@...hat.com>,
Andi Kleen <ak@...ux.intel.com>, Christophe Leroy <christophe.leroy@...roup.eu>,
"Dr. David Alan Gilbert" <linux@...blig.org>,
Krzysztof Łopatowski <krzysztof.m.lopatowski@...il.com>,
Dmitry Vyukov <dvyukov@...gle.com>, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] perf debug: Add function symbols to dump_stack
On Tue, Jun 3, 2025 at 2:55 PM Ian Rogers <irogers@...gle.com> wrote:
>
> On Mon, Jun 2, 2025 at 10:32 AM Ian Rogers <irogers@...gle.com> wrote:
> >
> > On Thu, May 29, 2025 at 3:02 PM Ian Rogers <irogers@...gle.com> wrote:
> > >
> > > On Thu, May 29, 2025 at 1:21 PM Arnaldo Carvalho de Melo
> > > <acme@...nel.org> wrote:
> > > >
> > > > On Wed, May 28, 2025 at 10:23:22PM -0700, Ian Rogers wrote:
> > > > > Symbolize stack traces by creating a live machine. Add this
> > > > > functionality to dump_stack and switch dump_stack users to use
> > > > > it. Switch TUI to use it. Add stack traces to the child test function
> > > > > which can be useful to diagnose blocked code.
> > > > >
> > > > > Example output:
> > > > > ```
> > > > > 8: PERF_RECORD_* events & perf_sample fields : Running (1 active)
> > > > > ^C
> > > > > Signal (2) while running tests.
> > > > > Terminating tests with the same signal
> > > > > Internal test harness failure. Completing any started tests:
> > > > > : 8: PERF_RECORD_* events & perf_sample fields:
> > > >
> > > > So you are testing it with:
> > > >
> > > > root@...ber:~# perf test PERF_RECORD
> > > > 8: PERF_RECORD_* events & perf_sample fields : Running (1 active)
> > > > ^C
> > > > Signal (2) while running tests.
> > > > Terminating tests with the same signal
> > > > Internal test harness failure. Completing any started tests:
> > > > 8: PERF_RECORD_* events & perf_sample fields : Skip (permissions)
> > > > root@...ber:~#
> > > >
> > > > ?
> > >
> > > You are running the test as a child and by default the child test
> > > output is only displayed if verbose is enabled. I use verbose=2 below
> > > as verbose=1 won't display anything unless the child test gives a fail
> > > exit value:
> > > ```
> > > $ git log --oneline
> > > b9ac06abfde9 perf debug: Add function symbols to dump_stack
> > > e561806265ed perf machine: Factor creating a "live" machine out of dwarf-unwind
> > > 628e124404b3 perf tests switch-tracking: Fix timestamp comparison
> > > $ make -C tools/perf O=/tmp/perf
> > > ...
> > > $ sudo /tmp/perf/perf test -vv PERF_RECORD
> > > 8: PERF_RECORD_* events & perf_sample fields:
> > > 8: PERF_RECORD_* events & perf_sample fields :
> > > Running (1 active)
> > > ^C
> > > Signal (2) while running tests.
> > > Terminating tests with the same signal
> > > Internal test harness failure. Completing any started tests:
> > > : 8: PERF_RECORD_* events & perf_sample fields:
> > >
> > > ---- unexpected signal (2) ----
> > > #0 0x5617ec45e5e3 in child_test_sig_handler builtin-test.c:0
> > > #1 0x7f855c649df0 in __restore_rt libc_sigaction.c:0
> > > #2 0x7f855c699687 in __internal_syscall_cancel cancellation.c:64
> > > #3 0x7f855c6e5f7a in clock_nanosleep@...BC_2.2.5 clock_nanosleep.c:72
> > > #4 0x7f855c6f1393 in __nanosleep nanosleep.c:26
> > > #5 0x7f855c702d68 in __sleep sleep.c:55
> > > #6 0x5617ec46ebfb in test__PERF_RECORD perf-record.c:0
> > > #7 0x5617ec45e4f0 in run_test_child builtin-test.c:0
> > > #8 0x5617ec3faf0d in start_command run-command.c:127
> > > #9 0x5617ec45f433 in __cmd_test builtin-test.c:0
> > > #10 0x5617ec45faff in cmd_test perf[147aff]
> > > #11 0x5617ec3ed960 in run_builtin perf.c:0
> > > #12 0x5617ec3edc7b in handle_internal_command perf.c:0
> > > #13 0x5617ec368d33 in main perf[50d33]
> > > #14 0x7f855c633ca8 in __libc_start_call_main libc_start_call_main.h:74
> > > #15 0x7f855c633d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
> > > #16 0x5617ec369381 in _start perf[51381]
> > >
> > > ---- unexpected signal (2) ----
> > > #0 0x5617ec45e5e3 in child_test_sig_handler builtin-test.c:0
> > > #1 0x7f855c649df0 in __restore_rt libc_sigaction.c:0
> > > #2 0x7f855c6a3a14 in pthread_sigmask@...BC_2.2.5 pthread_sigmask.c:45
> > > #3 0x7f855c649fd9 in __GI___sigprocmask sigprocmask.c:26
> > > #4 0x7f855c72601b in __longjmp_chk longjmp.c:36
> > > #5 0x5617ec45e600 in print_test_result.isra.0 builtin-test.c:0
> > > #6 0x7f855c649df0 in __restore_rt libc_sigaction.c:0
> > > #7 0x7f855c699687 in __internal_syscall_cancel cancellation.c:64
> > > #8 0x7f855c6e5f7a in clock_nanosleep@...BC_2.2.5 clock_nanosleep.c:72
> > > #9 0x7f855c6f1393 in __nanosleep nanosleep.c:26
> > > #10 0x7f855c702d68 in __sleep sleep.c:55
> > > #11 0x5617ec46ebfb in test__PERF_RECORD perf-record.c:0
> > > #12 0x5617ec45e4f0 in run_test_child builtin-test.c:0
> > > #13 0x5617ec3faf0d in start_command run-command.c:127
> > > #14 0x5617ec45f433 in __cmd_test builtin-test.c:0
> > > #15 0x5617ec45faff in cmd_test perf[147aff]
> > > #16 0x5617ec3ed960 in run_builtin perf.c:0
> > > #17 0x5617ec3edc7b in handle_internal_command perf.c:0
> > > #18 0x5617ec368d33 in main perf[50d33]
> > > #19 0x7f855c633ca8 in __libc_start_call_main libc_start_call_main.h:74
> > > #20 0x7f855c633d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
> > > #21 0x5617ec369381 in _start perf[51381]
> > > 8: PERF_RECORD_* events & perf_sample fields :
> > > Skip (permissions)
> > > ```
> > >
> > >
> > > > I built it with DEBUG=1 and without, and with your patch, following your
> > > > example output, I'm not being able to reproduce.
> > > >
> > > > Tried it as well with:
> > > >
> > > > ⬢ [acme@...lbx perf-tools-next]$ cat segv.patch
> > > > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> > > > index 8b30c6f16a9eeac1..e55d86f1097d6d79 100644
> > > > --- a/tools/perf/util/symbol.c
> > > > +++ b/tools/perf/util/symbol.c
> > > > @@ -402,6 +402,8 @@ static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
> > > > {
> > > > struct rb_node *n;
> > > >
> > > > + *(int *)NULL = 0;
> > > > +
> > > > if (symbols == NULL)
> > > > return NULL;
> > > >
> > > > ⬢ [acme@...lbx perf-tools-next]$ patch -p1 < segv.patch
> > > > patching file tools/perf/util/symbol.c
> > > > ⬢ [acme@...lbx perf-tools-next]$ m
> > > > rm: cannot remove '/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace/__pycache__/Core.cpython-313.pyc': Permission denied
> > > > make: Entering directory '/home/acme/git/perf-tools-next/tools/perf'
> > > > BUILD: Doing 'make -j32' parallel build
> > > > Warning: Kernel ABI header differences:
> > > > diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
> > > >
> > > > Auto-detecting system features:
> > > > ... libdw: [ on ]
> > > > ... glibc: [ on ]
> > > > ... libelf: [ on ]
> > > > ... libnuma: [ on ]
> > > > ... numa_num_possible_cpus: [ on ]
> > > > ... libperl: [ on ]
> > > > ... libpython: [ on ]
> > > > ... libcrypto: [ on ]
> > > > ... libcapstone: [ on ]
> > > > ... llvm-perf: [ on ]
> > > > ... zlib: [ on ]
> > > > ... lzma: [ on ]
> > > > ... get_cpuid: [ on ]
> > > > ... bpf: [ on ]
> > > > ... libaio: [ on ]
> > > > ... libzstd: [ on ]
> > > >
> > > > INSTALL libsubcmd_headers
> > > > INSTALL libperf_headers
> > > > INSTALL libapi_headers
> > > > INSTALL libsymbol_headers
> > > > INSTALL libbpf_headers
> > > > CC /tmp/build/perf-tools-next/util/symbol.o
> > > > LD /tmp/build/perf-tools-next/util/perf-util-in.o
> > > > LD /tmp/build/perf-tools-next/perf-util-in.o
> > > > AR /tmp/build/perf-tools-next/libperf-util.a
> > > > LINK /tmp/build/perf-tools-next/perf
> > > > GEN /tmp/build/perf-tools-next/python/perf.cpython-313-x86_64-linux-gnu.so
> > > > INSTALL binaries
> > > > INSTALL tests
> > > > INSTALL libperf-jvmti.so
> > > > INSTALL libexec
> > > > INSTALL perf-archive
> > > > INSTALL perf-iostat
> > > > INSTALL perl-scripts
> > > > INSTALL python-scripts
> > > > INSTALL dlfilters
> > > > INSTALL perf_completion-script
> > > > INSTALL perf-tip
> > > > make: Leaving directory '/home/acme/git/perf-tools-next/tools/perf'
> > > > 18: 'import perf' in python : Ok
> > > > ⬢ [acme@...lbx perf-tools-next]$
> > > >
> > > > root@...ber:~# perf top
> > > > perf: Segmentation fault
> > > > -------- backtrace --------
> > > > Segmentation fault (core dumped)
> > > > root@...ber:~#
> > > >
> > > > Tried also with this, but probably something else is at play:
> > > >
> > > > root@...ber:~# perf probe -x ~/bin/perf dump_stack
> > > > Added new event:
> > > > probe_perf:dump_stack (on dump_stack in /home/acme/bin/perf)
> > > >
> > > > You can now use it in all perf tools, such as:
> > > >
> > > > perf record -e probe_perf:dump_stack -aR sleep 1
> > > >
> > > > root@...ber:~#
> > > > root@...ber:~# perf trace -e probe_perf:dump_stack/max-stack=16/ perf top
> > > > perf: Segmentation fault
> > > > -------- backtrace --------
> > > > root@...ber:~#
> > > >
> > > > Running perf trace on a separate terminal it also doesn't catch
> > > > dump_stack being called (that probe point).
> > > >
> > > > root@...ber:~# perf top --stdio
> > > > PerfTop: 0 irqs/sec kernel: 0.0% exact: 0.0% lost: 0/0 drop: 0/0 [4000Hz cycles:P], (all, 32 CPUs)
> > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > > >
> > > > perf: Segmentation fault
> > > > Segmentation fault (core dumped)
> > > > root@...ber:~#
> > > >
> > > > root@...ber:~# nm ~/bin/perf | grep dump_stack
> > > > 0000000000637eaa T __dump_stack
> > > > 00000000006380fa T dump_stack
> > > > 000000000063816b T sighandler_dump_stack
> > > > root@...ber:~#
> > > >
> > > > ⬢ [acme@...lbx perf-tools-next]$ rpm -q glibc
> > > > glibc-2.41-5.fc42.x86_64
> > > > root@...ber:~# uname -a
> > > > Linux number 6.14.8-300.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 22 19:26:02 UTC 2025 x86_64 GNU/Linux
> > > > root@...ber:~#
> > > >
> > > > What am I missing?
> > >
> > > For perf top I can do:
> > > ```
> > > $ sudo /tmp/perf/perf top
> > > ```
> > > in a different terminal:
> > > ```
> > > $ sudo killall -11 perf
> > > ```
> > > then back in the first terminal:
> > > ```
> > > perf: Segmentation fault
> > > -------- backtrace --------
> > > #0 0x55ee68647918 in ui__signal_backtrace setup.c:110
> > > #1 0x7fe2d3849df0 in __restore_rt libc_sigaction.c:0
> > > #2 0x7fe2d38a49ee in __syscall_cancel_arch syscall_cancel.S:56
> > > #3 0x7fe2d3899668 in __internal_syscall_cancel cancellation.c:54
> > > #4 0x7fe2d38996ad in __syscall_cancel cancellation.c:79
> > > #5 0x7fe2d390d9c6 in __poll poll.c:43
> > > #6 0x55ee6855dfcb in fdarray__poll array.c:139
> > > #7 0x55ee68565a05 in perf_evlist__poll evlist.c:391
> > > #8 0x55ee68673b5a in evlist__poll evlist.c:594
> > > #9 0x55ee684e7a61 in __cmd_top builtin-top.c:1369
> > > #10 0x55ee684e9a6c in cmd_top builtin-top.c:1856
> > > #11 0x55ee6855d1f0 in run_builtin perf.c:351
> > > #12 0x55ee6855d497 in handle_internal_command perf.c:404
> > > #13 0x55ee6855d5f0 in run_argv perf.c:451
> > > #14 0x55ee6855d939 in main perf.c:558
> > > #15 0x7fe2d3833ca8 in __libc_start_call_main libc_start_call_main.h:74
> > > #16 0x7fe2d3833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
> > > #17 0x55ee684b0a91 in _start perf[4ca91]
> > > ```
> > > As the same symbol code is used for the backtrace then adding the segv
> > > there is likely causing a segv in the signal handler in __dump_stack.
> >
> > Just to note, it would be nice for this to land as I need to rebase:
> > https://lore.kernel.org/lkml/20250529044000.759937-19-irogers@google.com/
> > due to the machine changes. I believe the patches are WAI as shown in
> > the output.
>
> Ok, I see this made it into v6.16 PR:
> https://lore.kernel.org/lkml/20250603203501.1961487-1-acme@kernel.org/
Actually just the first patch. I'll rebase and resend just patch 2 as v3.
Thanks,
Ian
Powered by blists - more mailing lists