[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aS994rls5y5xP1br@google.com>
Date: Tue, 2 Dec 2025 16:01:38 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
James Clark <james.clark@...aro.org>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
linux-perf-users@...r.kernel.org,
Steven Rostedt <rostedt@...dmis.org>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Indu Bhagat <indu.bhagat@...cle.com>,
Jens Remus <jremus@...ux.ibm.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
linux-trace-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v6 5/6] perf tools: Merge deferred user callchains
On Tue, Dec 02, 2025 at 03:14:31PM -0800, Ian Rogers wrote:
> On Thu, Nov 20, 2025 at 3:48 PM Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > Save samples with deferred callchains in a separate list and deliver
> > them after merging the user callchains. If users don't want to merge
> > they can set tool->merge_deferred_callchains to false to prevent the
> > behavior.
> >
> > With previous result, now perf script will show the merged callchains.
> >
> > $ perf script
> > ...
> > pwd 2312 121.163435: 249113 cpu/cycles/P:
> > ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
> > ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
> > ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
> > ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
> > ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
> > ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
> > ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
> > 7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> > 7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> > 7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> > ...
> >
> > The old output can be get using --no-merge-callchain option.
> > Also perf report can get the user callchain entry at the end.
> >
> > $ perf report --no-children --stdio -q -S __build_id_parse.isra.0
> > # symbol: __build_id_parse.isra.0
> > 8.40% pwd [kernel.kallsyms]
> > |
> > ---__build_id_parse.isra.0
> > perf_event_mmap
> > mprotect_fixup
> > do_mprotect_pkey
> > __x64_sys_mprotect
> > do_syscall_64
> > entry_SYSCALL_64_after_hwframe
> > mprotect
> > _dl_sysdep_start
> > _dl_start_user
> >
> > Signed-off-by: Namhyung Kim <namhyung@...nel.org>
>
> Reviewed-by: Ian Rogers <irogers@...gle.com>
>
> > ---
> > tools/perf/Documentation/perf-script.txt | 5 ++
> > tools/perf/builtin-inject.c | 1 +
> > tools/perf/builtin-report.c | 1 +
> > tools/perf/builtin-script.c | 4 ++
> > tools/perf/util/callchain.c | 29 +++++++++
> > tools/perf/util/callchain.h | 3 +
> > tools/perf/util/evlist.c | 1 +
> > tools/perf/util/evlist.h | 2 +
> > tools/perf/util/session.c | 79 +++++++++++++++++++++++-
> > tools/perf/util/tool.c | 2 +
> > tools/perf/util/tool.h | 1 +
> > 11 files changed, 127 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
> > index 28bec7e78bc858ba..03d1129606328d6d 100644
> > --- a/tools/perf/Documentation/perf-script.txt
> > +++ b/tools/perf/Documentation/perf-script.txt
> > @@ -527,6 +527,11 @@ include::itrace.txt[]
> > The known limitations include exception handing such as
> > setjmp/longjmp will have calls/returns not match.
> >
> > +--merge-callchains::
> > + Enable merging deferred user callchains if available. This is the
> > + default behavior. If you want to see separate CALLCHAIN_DEFERRED
> > + records for some reason, use --no-merge-callchains explicitly.
> > +
> > :GMEXAMPLECMD: script
> > :GMEXAMPLESUBCMD:
> > include::guest-files.txt[]
> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> > index bd9245d2dd41aa48..51d2721b6db9dccb 100644
> > --- a/tools/perf/builtin-inject.c
> > +++ b/tools/perf/builtin-inject.c
> > @@ -2527,6 +2527,7 @@ int cmd_inject(int argc, const char **argv)
> > inject.tool.auxtrace = perf_event__repipe_auxtrace;
> > inject.tool.bpf_metadata = perf_event__repipe_op2_synth;
> > inject.tool.dont_split_sample_group = true;
> > + inject.tool.merge_deferred_callchains = false;
> > inject.session = __perf_session__new(&data, &inject.tool,
> > /*trace_event_repipe=*/inject.output.is_pipe,
> > /*host_env=*/NULL);
> > diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> > index 2bc269f5fcef8023..add6b1c2aaf04270 100644
> > --- a/tools/perf/builtin-report.c
> > +++ b/tools/perf/builtin-report.c
> > @@ -1614,6 +1614,7 @@ int cmd_report(int argc, const char **argv)
> > report.tool.event_update = perf_event__process_event_update;
> > report.tool.feature = process_feature_event;
> > report.tool.ordering_requires_timestamps = true;
> > + report.tool.merge_deferred_callchains = !dump_trace;
> >
> > session = perf_session__new(&data, &report.tool);
> > if (IS_ERR(session)) {
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index 85b42205a71b3993..62e43d3c5ad731a0 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -4009,6 +4009,7 @@ int cmd_script(int argc, const char **argv)
> > bool header_only = false;
> > bool script_started = false;
> > bool unsorted_dump = false;
> > + bool merge_deferred_callchains = true;
> > char *rec_script_path = NULL;
> > char *rep_script_path = NULL;
> > struct perf_session *session;
> > @@ -4162,6 +4163,8 @@ int cmd_script(int argc, const char **argv)
> > "Guest code can be found in hypervisor process"),
> > OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
> > "Enable LBR callgraph stitching approach"),
> > + OPT_BOOLEAN('\0', "merge-callchains", &merge_deferred_callchains,
> > + "Enable merge deferred user callchains"),
> > OPTS_EVSWITCH(&script.evswitch),
> > OPT_END()
> > };
> > @@ -4418,6 +4421,7 @@ int cmd_script(int argc, const char **argv)
> > script.tool.throttle = process_throttle_event;
> > script.tool.unthrottle = process_throttle_event;
> > script.tool.ordering_requires_timestamps = true;
> > + script.tool.merge_deferred_callchains = merge_deferred_callchains;
> > session = perf_session__new(&data, &script.tool);
> > if (IS_ERR(session))
> > return PTR_ERR(session);
> > diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> > index 2884187ccbbecfdc..71dc5a070065dd2a 100644
> > --- a/tools/perf/util/callchain.c
> > +++ b/tools/perf/util/callchain.c
> > @@ -1838,3 +1838,32 @@ int sample__for_each_callchain_node(struct thread *thread, struct evsel *evsel,
> > }
> > return 0;
> > }
> > +
> > +int sample__merge_deferred_callchain(struct perf_sample *sample_orig,
>
> nit: We use the term deferred rather than original except in this
> context. I think deferred is a little more intention revealing than
> original. Perhaps add a comment capturing that the original sample is
> the deferred kernel sample.
Sure, will add.
Thanks,
Namhyung
Powered by blists - more mailing lists