lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 26 Jun 2024 10:55:54 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: weilin.wang@...el.com
Cc: Ian Rogers <irogers@...gle.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Kan Liang <kan.liang@...ux.intel.com>,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
	Perry Taylor <perry.taylor@...el.com>,
	Samantha Alt <samantha.alt@...el.com>,
	Caleb Biggers <caleb.biggers@...el.com>
Subject: Re: [RFC PATCH v14 6/8] perf stat: Add command line option for
 enabling tpebs recording

Hello Weilin,

On Mon, Jun 24, 2024 at 06:20:22PM -0400, weilin.wang@...el.com wrote:
> From: Weilin Wang <weilin.wang@...el.com>
> 
> With this command line option, tpebs recording is turned off in perf stat on
> default. It will only be turned on when this option is given in perf stat
> command.
> 
> Exampe with --enable-tpebs-recording:

I prefer shorter names, how about --enable-tpebs or --record-tpebs, or
maybe just --tpebs ?

Thanks,
Namhyung

> 
> perf stat -M tma_split_loads -C1-4 --enable-tpebs-recording sleep 1
> 
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.044 MB - ]
> 
>  Performance counter stats for 'CPU(s) 1-4':
> 
>     53,259,156,071      cpu_core/TOPDOWN.SLOTS/          #      1.6 %  tma_split_loads          (50.00%)
>     15,867,565,250      cpu_core/topdown-retiring/                                              (50.00%)
>     15,655,580,731      cpu_core/topdown-mem-bound/                                             (50.00%)
>     11,738,022,218      cpu_core/topdown-bad-spec/                                              (50.00%)
>      6,151,265,424      cpu_core/topdown-fe-bound/                                              (50.00%)
>     20,445,917,581      cpu_core/topdown-be-bound/                                              (50.00%)
>      6,925,098,013      cpu_core/L1D_PEND_MISS.PENDING/                                         (50.00%)
>      3,838,653,421      cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/                                        (50.00%)
>      4,797,059,783      cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/                                        (50.00%)
>     11,931,916,714      cpu_core/CPU_CLK_UNHALTED.THREAD/                                        (50.00%)
>        102,576,164      cpu_core/MEM_LOAD_COMPLETED.L1_MISS_ANY/                                        (50.00%)
>         64,071,854      cpu_core/MEM_INST_RETIRED.SPLIT_LOADS/                                        (50.00%)
>                  3      cpu_core/MEM_INST_RETIRED.SPLIT_LOADS/R
> 
>        1.003049679 seconds time elapsed
> 
> Exampe without --enable-tpebs-recording:
> 
> perf stat -M tma_contested_accesses -C1 sleep 1
> 
>  Performance counter stats for 'CPU(s) 1':
> 
>         50,203,891      cpu_core/TOPDOWN.SLOTS/          #      0.0 %  tma_contested_accesses   (63.60%)
>         10,040,777      cpu_core/topdown-retiring/                                              (63.60%)
>          6,890,729      cpu_core/topdown-mem-bound/                                             (63.60%)
>          2,756,463      cpu_core/topdown-bad-spec/                                              (63.60%)
>         10,828,288      cpu_core/topdown-fe-bound/                                              (63.60%)
>         28,350,432      cpu_core/topdown-be-bound/                                              (63.60%)
>                 98      cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM/                                        (63.70%)
>            577,520      cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/                                        (54.62%)
>            313,339      cpu_core/MEMORY_ACTIVITY.STALLS_L3_MISS/                                        (54.62%)
>             14,155      cpu_core/MEM_LOAD_RETIRED.L1_MISS/                                        (45.54%)
>                  0      cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD/                                        (36.30%)
>          8,468,077      cpu_core/CPU_CLK_UNHALTED.THREAD/                                        (45.38%)
>                198      cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/                                        (45.38%)
>              8,324      cpu_core/MEM_LOAD_RETIRED.FB_HIT/                                        (45.38%)
>      3,388,031,520      TSC
>         23,226,785      cpu_core/CPU_CLK_UNHALTED.REF_TSC/                                        (54.46%)
>                 80      cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/                                        (54.46%)
>                  0      cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/R
>                  0      cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/R
>      1,006,816,667 ns   duration_time
> 
>        1.002537737 seconds time elapsed
> 
> Signed-off-by: Weilin Wang <weilin.wang@...el.com>
> ---
>  tools/perf/Documentation/perf-stat.txt | 8 ++++++++
>  tools/perf/builtin-stat.c              | 4 ++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
> index 29756a87ab6f..f4cde834811d 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -498,6 +498,14 @@ To interpret the results it is usually needed to know on which
>  CPUs the workload runs on. If needed the CPUs can be forced using
>  taskset.
>  
> +--enable-tpebs-recording::
> +Enable automatic sampling on Intel TPEBS retire_latency events (event with :R
> +modifier). Without this option, perf would not capture dynamic retire_latency
> +at runtime. Currently, a zero value is assigned to the retire_latency event when
> +this option is not set. The TPEBS hardware feature starts from Intel Granite
> +Rapids microarchitecture. This option only exists in X86_64 and is meaningful on
> +Intel platforms with TPEBS feature.
> +
>  --td-level::
>  Print the top-down statistics that equal the input level. It allows
>  users to print the interested top-down metrics level instead of the
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 68125bd75b37..7111c96e68ab 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -2475,6 +2475,10 @@ int cmd_stat(int argc, const char **argv)
>  			"disable adding events for the metric threshold calculation"),
>  		OPT_BOOLEAN(0, "topdown", &topdown_run,
>  			"measure top-down statistics"),
> +#ifdef HAVE_ARCH_X86_64_SUPPORT
> +		OPT_BOOLEAN(0, "enable-tpebs-recording", &tpebs_recording,
> +			"enable recording for tpebs when retire_latency required"),
> +#endif
>  		OPT_UINTEGER(0, "td-level", &stat_config.topdown_level,
>  			"Set the metrics level for the top-down statistics (0: max level)"),
>  		OPT_BOOLEAN(0, "smi-cost", &smi_cost,
> -- 
> 2.43.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ