linux-kernel - Re: [PATCH v1] perf stat: Combine the -A/--no-aggr and --no-merge options

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd3c9e63-1ae8-4106-b4ed-e1332b29a4ce@linux.intel.com>
Date:   Thu, 14 Dec 2023 10:10:57 -0500
From:   "Liang, Kan" <kan.liang@...ux.intel.com>
To:     Ian Rogers <irogers@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
        Kaige Ye <ye@...ge.org>, James Clark <james.clark@....com>,
        Changbin Du <changbin.du@...wei.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        John Garry <john.g.garry@...cle.com>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] perf stat: Combine the -A/--no-aggr and --no-merge
 options



On 2023-12-14 1:02 a.m., Ian Rogers wrote:
> The -A or --no-aggr option disables aggregation of core events:
> ```
> $ perf stat -A -e cycles,data_total -a true
> 
>  Performance counter stats for 'system wide':
> 
> CPU0            1,287,665      cycles
> CPU1            1,831,681      cycles
> CPU2           27,345,998      cycles
> CPU3            1,964,799      cycles
> CPU4              236,174      cycles
> CPU5            3,302,825      cycles
> CPU6            9,201,446      cycles
> CPU7            1,403,043      cycles
> CPU0               110.90 MiB  data_total
> 
>        0.008961761 seconds time elapsed
> ```
> 
> The --no-merge option disables the aggregation of uncore events:
> ```
> $ perf stat --no-merge -e cycles,data_total -a true
> 
>  Performance counter stats for 'system wide':
> 
>         38,482,778      cycles
>              15.04 MiB  data_total [uncore_imc_free_running_1]
>              15.00 MiB  data_total [uncore_imc_free_running_0]
> 
>        0.005915155 seconds time elapsed
> ```
> 
> Having two options confuses users who generally don't appreciate the
> difference in PMUs. Keep all the options but make it so they all
> disable aggregation both of core and uncore events:
> ```
> $ perf stat -A -e cycles,data_total -a true
> 
>  Performance counter stats for 'system wide':
> 
> CPU0               85,878      cycles
> CPU1               88,179      cycles
> CPU2               60,872      cycles
> CPU3            3,265,567      cycles
> CPU4               82,357      cycles
> CPU5               83,383      cycles
> CPU6               84,156      cycles
> CPU7              220,803      cycles
> CPU0                 2.38 MiB  data_total [uncore_imc_free_running_0]
> CPU0                 2.38 MiB  data_total [uncore_imc_free_running_1]
> 
>        0.001397205 seconds time elapsed
> ```
> 
> Update the relevant perf-stat man page information.
> 
> Signed-off-by: Ian Rogers <irogers@...gle.com>

Reviewed-by: Kan Liang <kan.liang@...ux.intel.com>

Thanks,
Kan

> ---
>  tools/perf/Documentation/perf-stat.txt | 52 ++++++++++++++------------
>  tools/perf/builtin-stat.c              |  5 ++-
>  tools/perf/util/stat-display.c         |  2 +-
>  tools/perf/util/stat.c                 |  2 +-
>  tools/perf/util/stat.h                 |  1 -
>  5 files changed, 33 insertions(+), 29 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
> index 8f789fa1242e..5af2e432b54f 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -422,7 +422,34 @@ See perf list output for the possible metrics and metricgroups.
>  
>  -A::
>  --no-aggr::
> -Do not aggregate counts across all monitored CPUs.
> +--no-merge::
> +Do not aggregate/merge counts across monitored CPUs or PMUs.
> +
> +When multiple events are created from a single event specification,
> +stat will, by default, aggregate the event counts and show the result
> +in a single row. This option disables that behavior and shows the
> +individual events and counts.
> +
> +Multiple events are created from a single event specification when:
> +
> +1. PID monitoring isn't requested and the system has more than one
> +   CPU. For example, a system with 8 SMT threads will have one event
> +   opened on each thread and aggregation is performed across them.
> +
> +2. Prefix or glob wildcard matching is used for the PMU name. For
> +   example, multiple memory controller PMUs may exist typically with a
> +   suffix of _0, _1, etc. By default the event counts will all be
> +   combined if the PMU is specified without the suffix such as
> +   uncore_imc rather than uncore_imc_0.
> +
> +3. Aliases, which are listed immediately after the Kernel PMU events
> +   by perf list, are used.
> +
> +--hybrid-merge::
> +Merge core event counts from all core PMUs. In hybrid or big.LITTLE
> +systems by default each core PMU will report its count
> +separately. This option forces core PMU counts to be combined to give
> +a behavior closer to having a single CPU type in the system.
>  
>  --topdown::
>  Print top-down metrics supported by the CPU. This allows to determine
> @@ -475,29 +502,6 @@ highlight 'tma_frontend_bound'. This metric may be drilled into with
>  
>  Error out if the input is higher than the supported max level.
>  
> ---no-merge::
> -Do not merge results from same PMUs.
> -
> -When multiple events are created from a single event specification,
> -stat will, by default, aggregate the event counts and show the result
> -in a single row. This option disables that behavior and shows
> -the individual events and counts.
> -
> -Multiple events are created from a single event specification when:
> -1. Prefix or glob matching is used for the PMU name.
> -2. Aliases, which are listed immediately after the Kernel PMU events
> -   by perf list, are used.
> -
> ---hybrid-merge::
> -Merge the hybrid event counts from all PMUs.
> -
> -For hybrid events, by default, the stat aggregates and reports the event
> -counts per PMU. But sometimes, it's also useful to aggregate event counts
> -from all PMUs. This option enables that behavior and reports the counts
> -without PMUs.
> -
> -For non-hybrid events, it should be no effect.
> -
>  --smi-cost::
>  Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
>  
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index bda020c0b9d5..5fe9abc6a524 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1204,8 +1204,9 @@ static struct option stat_options[] = {
>  	OPT_STRING('C', "cpu", &target.cpu_list, "cpu",
>  		    "list of cpus to monitor in system-wide"),
>  	OPT_SET_UINT('A', "no-aggr", &stat_config.aggr_mode,
> -		    "disable CPU count aggregation", AGGR_NONE),
> -	OPT_BOOLEAN(0, "no-merge", &stat_config.no_merge, "Do not merge identical named events"),
> +		    "disable aggregation across CPUs or PMUs", AGGR_NONE),
> +	OPT_SET_UINT(0, "no-merge", &stat_config.aggr_mode,
> +		    "disable aggregation the same as -A or -no-aggr", AGGR_NONE),
>  	OPT_BOOLEAN(0, "hybrid-merge", &stat_config.hybrid_merge,
>  		    "Merge identical named hybrid events"),
>  	OPT_STRING('x', "field-separator", &stat_config.csv_sep, "separator",
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index afe6db8e7bf4..8c61f8627ebc 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -898,7 +898,7 @@ static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config
>  
>  static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
>  {
> -	if (config->no_merge || hybrid_uniquify(counter, config))
> +	if (config->aggr_mode == AGGR_NONE || hybrid_uniquify(counter, config))
>  		uniquify_event_name(counter);
>  }
>  
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 012c4946b9c4..b0bcf92f0f9c 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -592,7 +592,7 @@ void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *ev
>  {
>  	struct evsel *evsel;
>  
> -	if (config->no_merge)
> +	if (config->aggr_mode == AGGR_NONE)
>  		return;
>  
>  	evlist__for_each_entry(evlist, evsel)
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 325d0fad1842..4357ba114822 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -76,7 +76,6 @@ struct perf_stat_config {
>  	bool			 null_run;
>  	bool			 ru_display;
>  	bool			 big_num;
> -	bool			 no_merge;
>  	bool			 hybrid_merge;
>  	bool			 walltime_run_table;
>  	bool			 all_kernel;