linux-kernel - Re: [PATCH v1] perf test: Avoid hard coded metrics in stat std output test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAP-5=fV-mGX7QXT4G=VyzWjL5_AJuZ_69aj6JYbV8NVhuy-8TA@mail.gmail.com>
Date: Fri, 19 Apr 2024 08:23:22 -0700
From: Ian Rogers <irogers@...gle.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Yicong Yang <yangyicong@...ilicon.com>, 
	Athira Rajeev <atrajeev@...ux.vnet.ibm.com>, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] perf test: Avoid hard coded metrics in stat std output test

On Fri, Apr 19, 2024 at 8:09 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>
>
>
> On 2024-04-19 10:40 a.m., Ian Rogers wrote:
> > On Fri, Apr 19, 2024 at 6:54 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
> >>
> >>
> >>
> >> On 2024-04-17 2:32 p.m., Ian Rogers wrote:
> >>> Hard coded metric names fail on ARM testing.
> >>>
> >>> Signed-off-by: Ian Rogers <irogers@...gle.com>
> >>> ---
> >>>  tools/perf/tests/shell/stat+std_output.sh | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/tools/perf/tests/shell/stat+std_output.sh b/tools/perf/tests/shell/stat+std_output.sh
> >>> index cbf2894b2c84..845f83213855 100755
> >>> --- a/tools/perf/tests/shell/stat+std_output.sh
> >>> +++ b/tools/perf/tests/shell/stat+std_output.sh
> >>> @@ -13,7 +13,7 @@ stat_output=$(mktemp /tmp/__perf_test.stat_outputstd.XXXXX)
> >>>
> >>>  event_name=(cpu-clock task-clock context-switches cpu-migrations page-faults stalled-cycles-frontend stalled-cycles-backend cycles instructions branches branch-misses)
> >>>  event_metric=("CPUs utilized" "CPUs utilized" "/sec" "/sec" "/sec" "frontend cycles idle" "backend cycles idle" "GHz" "insn per cycle" "/sec" "of all branches")
> >>> -skip_metric=("stalled cycles per insn" "tma_" "retiring" "frontend_bound" "bad_speculation" "backend_bound")
> >>> +skip_metric=($(perf list --raw Default 2> /dev/null))
> >>
> >>
> >> The "perf list --raw Default" only gives the topdown metrics.
> >> The "stalled cycles per insn" is not covered.
> >> The check should skip the line of "stalled cycles per insn" as well.
> >>
> >>      3,856,436,920 stalled-cycles-frontend   #   74.09% frontend cycles idle
> >>      1,600,790,871 stalled-cycles-backend    #   30.75% backend  cycles idle
> >>      2,603,501,247 instructions              #    0.50  insns per cycle
> >>                                              #    1.48  stalled cycles
> >> per insn
> >>        484,357,498 branches                  #  283.455 M/sec
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/builtin-stat.c#n24
> >>
> >> The newer Intel CPU doesn't have the stalled-cycles-* events. But it
> >> seems power and older x86 CPU have the events.
> >
> > Oh, sigh. This test should really ignore lines like that. How much do
> > we care about these metrics? The RISC-V event parsing change:
> > https://lore.kernel.org/lkml/20240416061533.921723-1-irogers@google.com/
> > means that legacy hardware events will be uncommon and we need to
> > adapt the hard coded metrics in stat-shadow.c to json ones. Once they
> > are json metrics they will be in Default.
>
> It seems except the newer Intel CPU, all the other ARCHs support the two
> stalled-cycles-* events and the metric. For Intel, there are Topdown
> metrics. But it seems an important metrics for the other ARCHs.
>
> RISC-V
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/riscv_pmu_sbi.c#n134
> Power
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/perf/power9-pmu.c#n279
> Arm
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_pmuv3.c#n53
>
> So almost all json files have to be updated. I'm not sure if it's a
> practical way to fix the issue.

So I'd very much like to get rid of the hard coded metrics:
 - they don't use or respect event groups,
 - their ad hoc printing can introduce extra metric results
unexpectedly in output,
 - they fall outside of optimizations like Weilin's metric event grouping work.
I'm hoping the python json generation of metrics makes their removal practical:
https://lore.kernel.org/lkml/20240314055919.1979781-1-irogers@google.com/

That's a lot to get landed for this fix:
 - 40+ patches for python based json generation.
 - 10+ patches for parse events changes.
So I think a version that hard codes ignoring the hard coded metrics
is in order.

Thanks,
Ian

> Thanks,
> Kan
> >
> > Thanks,
> > Ian
> >
> >> Thanks,
> >> Kan
> >>
> >>>
> >>>  cleanup() {
> >>>    rm -f "${stat_output}"
> >