linux-kernel - Re: [RFC/PATCH] perf report: Support latency profiling in system-wide mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBo_z8Q87gflYyuX@x1>
Date: Tue, 6 May 2025 13:58:55 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Namhyung Kim <namhyung@...nel.org>, Ingo Molnar <mingo@...nel.org>,
	Ian Rogers <irogers@...gle.com>,
	Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-perf-users@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC/PATCH] perf report: Support latency profiling in
 system-wide mode

On Tue, May 06, 2025 at 11:57:19AM -0300, Arnaldo Carvalho de Melo wrote:
> On Mon, May 05, 2025 at 10:03:05AM +0200, Dmitry Vyukov wrote:
> > On Sun, 4 May 2025 at 21:52, Namhyung Kim <namhyung@...nel.org> wrote:
> > > On Sun, May 04, 2025 at 10:22:26AM +0200, Ingo Molnar wrote:
> > > > > Here's an example:
> 
> > > > >   # perf record -a -- perf bench sched messaging
> 
> > > > > This basically forks each sender and receiver tasks for the run.
> 
> > > > >   # perf report --latency -s comm --stdio
> > > > >   ...
> > > > >   #
> > > > >   #  Latency  Overhead  Command
> > > > >   # ........  ........  ...............
> > > > >   #
> > > > >       98.14%    95.97%  sched-messaging
> > > > >        0.78%     0.93%  gnome-shell
> > > > >        0.36%     0.34%  ptyxis
> > > > >        0.23%     0.23%  kworker/u112:0-
> > > > >        0.23%     0.44%  perf
> > > > >        0.08%     0.10%  KMS thread
> > > > >        0.05%     0.05%  rcu_preempt
> > > > >        0.05%     0.05%  kworker/u113:2-
> > > > >        ...
> 
> > > > Just a generic user-interface comment: I had to look up what 'latency'
> > > > means in this context, and went about 3 hops deep into various pieces
> > > > of description until I found Documentation/cpu-and-latency-overheads.txt,
> > > > where after a bit of head-scratching I realized that 'latency' is a
> > > > weird alias for 'wall-clock time'...
> 
> > > > This is *highly* confusing terminology IMHO.
> 
> > > Sorry for the confusion.  I know I'm terrible at naming things. :)
> 
> > > Actually Dmitry used the term 'wall-clock' profiling at first when he
> > > implemented this feature but I thought it was not clear how it meant
> > > for non-cycle events.  As 'overhead' is also a generic term, we ended
> > > up with 'latency'.
>  
> > Exactly :)
> 
> So, the 'cycles' event is the most commonly used, its the default if one
> does 'perf record' and don't ask for a specific event.
> 
> When we notice that 'cycles' and 'instructions' are counted, we provide
> an IPC, as that is a well known metric for that combo:
> 
>   root@...ber:~# perf stat -e cycles,instructions sleep 1
>   
>    Performance counter stats for 'sleep 1':
>   
>            1,149,493      cycles                                                                
>            1,197,279      instructions                     #    1.04  insn per cycle            
>   
>          1.000645408 seconds time elapsed
>   
>          0.000599000 seconds user
>          0.000000000 seconds sys
>   
>   
>   root@...ber:~#
> 
> So maybe when we notice that cycles was used 'perf report/top' should
> use the term 'wall-clock' for the column name?
> 
> So instead of having a --latency we could have a hint that would tell
> the user that if this knob was set:
> 
>   $ perf config report.wall-clock=true
> 
> Then if 'cycles' is present we would have:

Something along the lines of the patch below, but there are several
details to take into account with what is in the current codebase, only
if what is needed for doing the latency/wall-clock time is present in
the perf.data being used is present that knob would be used or suggested
to the user, so some refactorings are needed, I'll try to folow on it.

But just for reference:

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f0299c7ee0254a37..20874800d967ffb5 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -51,6 +51,7 @@
 #include "util/util.h" // perf_tip()
 #include "ui/ui.h"
 #include "ui/progress.h"
+#include "ui/util.h"
 #include "util/block-info.h"
 
 #include <dlfcn.h>
@@ -127,6 +128,11 @@ static int report__config(const char *var, const char *value, void *cb)
 {
 	struct report *rep = cb;
 
+	if (!strcmp(var, "report.prefer_latency")) {
+		symbol_conf.prefer_latency = perf_config_bool(var, value);
+		symbol_conf.prefer_latency_set = true;
+		return 0;
+	}
 	if (!strcmp(var, "report.group")) {
 		symbol_conf.event_group = perf_config_bool(var, value);
 		return 0;
@@ -1734,6 +1740,15 @@ int cmd_report(int argc, const char **argv)
 		symbol_conf.annotate_data_sample = true;
 	}
 
+	if (!symbol_conf.prefer_latency) {
+		if (ui__dialog_yesno("Do you want to use latency mode?\n"
+				     "That will add a 'Latency' column that would mean\n"
+				     "'wall-clock' time when used with cycles, for instance\n")) {
+                       symbol_conf.prefer_latency = symbol_conf.prefer_latency_set = true;
+                       perf_config__set_variable("report.prefer_latency", "yes");
+               }
+       }
+
 	symbol_conf.enable_latency = true;
 	if (report.disable_order || !perf_session__has_switch_events(session)) {
 		if (symbol_conf.parallelism_list_str ||
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index cd9aa82c7d5ad941..f87071f5dedd94ca 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -51,6 +51,7 @@ struct symbol_conf {
 			annotate_data_sample,
 			skip_empty,
 			enable_latency,
+			prefer_latency_set,
 			prefer_latency;
 	const char	*vmlinux_name,
 			*kallsyms_name,