linux-kernel - Re: [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM9d7ciZAZzsA=cDojZp8H5tTd-F7NC=OnTMRg3S3cdjCW0+eA@mail.gmail.com>
Date:   Wed, 8 Jun 2022 17:09:49 -0700
From:   Namhyung Kim <namhyung@...il.com>
To:     Xing Zhengjun <zhengjun.xing@...ux.intel.com>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, alexander.shishkin@...el.com,
        Jiri Olsa <jolsa@...hat.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-perf-users <linux-perf-users@...r.kernel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Kan Liang <kan.liang@...ux.intel.com>
Subject: Re: [PATCH 5/5] perf stat: Add topdown metrics in the default perf
 stat on the hybrid machine

On Tue, Jun 7, 2022 at 1:08 AM <zhengjun.xing@...ux.intel.com> wrote:
>
> From: Zhengjun Xing <zhengjun.xing@...ux.intel.com>
>
> Topdown metrics are missed in the default perf stat on the hybrid machine,
> add Topdown metrics in default perf stat for hybrid systems.
>
> Currently, we support the perf metrics Topdown for the p-core PMU in the
> perf stat default, the perf metrics Topdown support for e-core PMU will be
> implemented later separately. Refactor the code adds two x86 specific
> functions. Widen the size of the event name column by 7 chars, so that all
> metrics after the "#" become aligned again.
>
> The perf metrics topdown feature is supported on the cpu_core of ADL. The
> dedicated perf metrics counter and the fixed counter 3 are used for the
> topdown events. Adding the topdown metrics doesn't trigger multiplexing.
>
> Before:
>
>  # ./perf  stat  -a true
>
>  Performance counter stats for 'system wide':
>
>              53.70 msec cpu-clock                 #   25.736 CPUs utilized
>                 80      context-switches          #    1.490 K/sec
>                 24      cpu-migrations            #  446.951 /sec
>                 52      page-faults               #  968.394 /sec
>          2,788,555      cpu_core/cycles/          #   51.931 M/sec
>            851,129      cpu_atom/cycles/          #   15.851 M/sec
>          2,974,030      cpu_core/instructions/    #   55.385 M/sec
>            416,919      cpu_atom/instructions/    #    7.764 M/sec
>            586,136      cpu_core/branches/        #   10.916 M/sec
>             79,872      cpu_atom/branches/        #    1.487 M/sec
>             14,220      cpu_core/branch-misses/   #  264.819 K/sec
>              7,691      cpu_atom/branch-misses/   #  143.229 K/sec
>
>        0.002086438 seconds time elapsed
>
> After:
>
>  # ./perf stat  -a true
>
>  Performance counter stats for 'system wide':
>
>              61.39 msec cpu-clock                        #   24.874 CPUs utilized
>                 76      context-switches                 #    1.238 K/sec
>                 24      cpu-migrations                   #  390.968 /sec
>                 52      page-faults                      #  847.097 /sec
>          2,753,695      cpu_core/cycles/                 #   44.859 M/sec
>            903,899      cpu_atom/cycles/                 #   14.725 M/sec
>          2,927,529      cpu_core/instructions/           #   47.690 M/sec
>            428,498      cpu_atom/instructions/           #    6.980 M/sec
>            581,299      cpu_core/branches/               #    9.470 M/sec
>             83,409      cpu_atom/branches/               #    1.359 M/sec
>             13,641      cpu_core/branch-misses/          #  222.216 K/sec
>              8,008      cpu_atom/branch-misses/          #  130.453 K/sec
>         14,761,308      cpu_core/slots/                  #  240.466 M/sec
>          3,288,625      cpu_core/topdown-retiring/       #     22.3% retiring
>          1,323,323      cpu_core/topdown-bad-spec/       #      9.0% bad speculation
>          5,477,470      cpu_core/topdown-fe-bound/       #     37.1% frontend bound
>          4,679,199      cpu_core/topdown-be-bound/       #     31.7% backend bound
>            646,194      cpu_core/topdown-heavy-ops/      #      4.4% heavy operations       #     17.9% light operations
>          1,244,999      cpu_core/topdown-br-mispredict/  #      8.4% branch mispredict      #      0.5% machine clears
>          3,891,800      cpu_core/topdown-fetch-lat/      #     26.4% fetch latency          #     10.7% fetch bandwidth
>          1,879,034      cpu_core/topdown-mem-bound/      #     12.7% memory bound           #     19.0% Core bound
>
>        0.002467839 seconds time elapsed
>
> Signed-off-by: Zhengjun Xing <zhengjun.xing@...ux.intel.com>
> Reviewed-by: Kan Liang <kan.liang@...ux.intel.com>
> ---
[SNIP]
> +const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
> +{
> +       const char *pmu_name = "cpu";
> +
> +       if (perf_pmu__has_hybrid()) {
> +               if (!evlist->hybrid_pmu_name) {
> +                       if (warn)
> +                               pr_warning
> +                                   ("WARNING: default to use cpu_core topdown events\n");
> +                       evlist->hybrid_pmu_name =
> +                           perf_pmu__hybrid_type_to_pmu("core");

This doesn't look good.  Please consider reducing the
indent level like returning early as

    if (!perf_pmu__has_hybrid())
        return "cpu";

    if (!evlist->hybrid_pmu_name) {
        ...

Thanks,
Namhyung


> +               }
> +
> +               pmu_name = evlist->hybrid_pmu_name;
> +       }
> +       return pmu_name;
> +}