linux-kernel - Re: [PATCH] perf record: add a shortcut for metrics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWmmtagTVfacFZgdhughvU--Dz0=jkoqFB8CP1Qd3o3Yw@mail.gmail.com>
Date: Mon, 27 May 2024 22:01:37 -0700
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Artem Savkov <asavkov@...hat.com>, linux-perf-users@...r.kernel.org, 
	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, "Liang, Kan" <kan.liang@...ux.intel.com>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf record: add a shortcut for metrics

On Mon, May 27, 2024 at 10:46 AM Arnaldo Carvalho de Melo
<acme@...nel.org> wrote:
>
> On Mon, May 27, 2024 at 02:28:32PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Mon, May 27, 2024 at 02:04:54PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Mon, May 27, 2024 at 02:02:33PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Mon, May 27, 2024 at 12:15:19PM +0200, Artem Savkov wrote:
> > > > > Add -M/--metrics option to perf-record providing a shortcut to record
> > > > > metrics and metricgroups. This option mirrors the one in perf-stat.
> >
> > > > > Suggested-by: Arnaldo Carvalho de Melo <acme@...nel.org>
> > > > > Signed-off-by: Artem Savkov <asavkov@...hat.com>
>
> > How did you test this?
>
> > The idea, from my notes, was to be able to have extra columns in 'perf
> > report' with things like IPC and other metrics, probably not all metrics
> > will apply. We need to find a way to find out which ones are OK for that
> > purpose, for instance:
>
> One that may make sense:
>
> root@...ber:~# perf record -M tma_fb_full
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 3.846 MB perf.data (21745 samples) ]
>
> root@...ber:~# perf evlist
> cpu_core/CPU_CLK_UNHALTED.THREAD/
> cpu_core/L1D_PEND_MISS.FB_FULL/
> dummy:u
> root@...ber:~#
>
> But then we need to read both to do the math, maybe something like:
>
> root@...ber:~# perf record -e '{cpu_core/CPU_CLK_UNHALTED.THREAD/,cpu_core/L1D_PEND_MISS.FB_FULL/}:S'
> ^C[ perf record: Woken up 40 times to write data ]
> [ perf record: Captured and wrote 14.640 MB perf.data (219990 samples) ]
>
> root@...ber:~# perf script | head
>     cc1plus 1339704 [000] 36028.995981:  2011389 cpu_core/CPU_CLK_UNHALTED.THREAD/:           1097303 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
>     cc1plus 1339704 [000] 36028.995981:    26231   cpu_core/L1D_PEND_MISSFB_FULL/:           1097303 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
>     cc1plus 1340011 [001] 36028.996008:  2004568 cpu_core/CPU_CLK_UNHALTED.THREAD/:            8c23b4 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
>     cc1plus 1340011 [001] 36028.996008:    20113   cpu_core/L1D_PEND_MISSFB_FULL/:            8c23b4 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
>       clang 1340462 [002] 36028.996043:  2007356 cpu_core/CPU_CLK_UNHALTED.THREAD/:  ffffffffb43b045d release_pages+0x3dd ([kernel.kallsyms])
>       clang 1340462 [002] 36028.996043:    23481   cpu_core/L1D_PEND_MISSFB_FULL/:  ffffffffb43b045d release_pages+0x3dd ([kernel.kallsyms])
>     cc1plus 1339622 [003] 36028.996066:  2004148 cpu_core/CPU_CLK_UNHALTED.THREAD/:            760874 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
>     cc1plus 1339622 [003] 36028.996066:    31935   cpu_core/L1D_PEND_MISSFB_FULL/:            760874 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
>          as 1340513 [004] 36028.996097:  2005052 cpu_core/CPU_CLK_UNHALTED.THREAD/:  ffffffffb4491d65 __count_memcg_events+0x55 ([kernel.kallsyms])
>          as 1340513 [004] 36028.996097:    45084   cpu_core/L1D_PEND_MISSFB_FULL/:  ffffffffb4491d65 __count_memcg_events+0x55 ([kernel.kallsyms])
> root@...ber:~#
>
> root@...ber:~# perf report --stdio -F +period | head -20
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 219K of events 'anon group { cpu_core/CPU_CLK_UNHALTED.THREAD/, cpu_core/L1D_PEND_MISS.FB_FULL/ }'
> # Event count (approx.): 216528524863
> #
> #         Overhead                Period  Command    Shared Object      Symbol
> # ................  ....................  .........  .................  ...................................
> #
>      4.01%   1.09%  8538169256  39826572  podman     [kernel.kallsyms]  [k] native_queued_spin_lock_slowpath
>      1.35%   1.17%  2863376078  42829266  cc1plus    cc1plus            [] 0x00000000003f6bcc
>      0.94%   0.78%  1990639149  28408591  cc1plus    cc1plus            [] 0x00000000003f6be4
>      0.65%   0.17%  1375916283   6109515  podman     [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
>      0.61%   0.99%  1304418325  36198834  cc1plus    [kernel.kallsyms]  [k] get_mem_cgroup_from_mm
>      0.52%   0.42%  1103054030  15427418  cc1plus    cc1plus            [] 0x0000000000ca6c69
>      0.51%   0.17%  1094200572   6299289  podman     [kernel.kallsyms]  [k] psi_group_change
>      0.42%   0.41%   893633315  14778675  cc1plus    cc1plus            [] 0x00000000018afafe
>      0.42%   1.29%   887664793  47046952  cc1plus    [kernel.kallsyms]  [k] asm_exc_page_fault
> root@...ber:~#
>
> That 'tma_fb_full' metric then would be another column, calculated from
> the sampled components of its metric equation:
>
> root@...ber:~# perf list tma_fb_full | head
>
> Metric Groups:
>
> MemoryBW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
>   tma_fb_full
>        [This metric does a *rough estimation* of how often L1D Fill Buffer
>         unavailability limited additional L1D miss memory access requests to
>         proceed]
>
> TopdownL4: [Metrics for top-down breakdown at level 4]
> root@...ber:~#
>
> This is roughly what we brainstormed, to support metrics in other tools
> than 'perf stat' but we need to check the possibilities and limitations
> of such an idea, hopefully this discussion will help with that,

Putting metrics next to code in perf report/annotate sounds good to
me, opening all events from a metric as if we want to sample on them
less so. We don't have metrics working with `perf stat record`, I
think Kan may have volunteered for that, but it seems like something
more urgent than expanding `perf record`. Presumably the way the
metric would be recorded for that could also benefit this effort.

If you look at the tma metrics a number of them have a "Sample with".
For example:
```
$ perf list -v
..
  tma_branch_mispredicts
       [This metric represents fraction of slots the CPU has wasted
due to Branch Misprediction.
        These slots are either wasted by uops fetched from an
incorrectly speculated program path;
        or stalls when the out-of-order part of the machine needs to
recover its state from a
        speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES.
Related metrics:
        tma_info_bad_spec_branch_misprediction_cost,tma_info_bottleneck_mispredictions,
        tma_mispredicts_resteers]
..
```
It could be logical for `perf record -M tma_branch_mispredicts ...` to
be translated to `perf record -e BR_MISP_RETIRED.ALL_BRANCHES ...`
rather than to do any form of counting.

Thanks,
Ian