linux-kernel - Re: [PATCH v1 2/2] perf test: Hybrid improvements for metric value validation test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fVaC662z2nJnTwBHMu-RE+9HkRmvSyNT5qB448+zw6e9Q@mail.gmail.com>
Date: Mon, 12 May 2025 14:52:07 -0700
From: Ian Rogers <irogers@...gle.com>
To: "Falcon, Thomas" <thomas.falcon@...el.com>, "Wang, Weilin" <weilin.wang@...el.com>
Cc: "james.clark@...aro.org" <james.clark@...aro.org>, 
	"alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>, 
	"linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"peterz@...radead.org" <peterz@...radead.org>, "mark.rutland@....com" <mark.rutland@....com>, 
	"mingo@...hat.com" <mingo@...hat.com>, "Hunter, Adrian" <adrian.hunter@...el.com>, 
	"acme@...nel.org" <acme@...nel.org>, "namhyung@...nel.org" <namhyung@...nel.org>, 
	"kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>, "jolsa@...nel.org" <jolsa@...nel.org>
Subject: Re: [PATCH v1 2/2] perf test: Hybrid improvements for metric value
 validation test

On Mon, May 12, 2025 at 1:30 PM Falcon, Thomas <thomas.falcon@...el.com> wrote:
>
> On Mon, 2025-05-12 at 11:47 -0700, Ian Rogers wrote:
> > On my alderlake I currently see for the "perf metrics value validation" test:
> > ```
> > Total Test Count:  142
> > Passed Test Count:  139
> > [
> > Metric Relationship Error:      The collected value of metric ['tma_fetch_latency', 'tma_fetch_bandwidth', 'tma_frontend_bound']
> >                         is [31.137028] in workload(s): ['perf bench futex hash -r 2 -s']
> >                         but expected value range is [tma_frontend_bound, tma_frontend_bound]
> >                         Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
> > Metric Relationship Error:      The collected value of metric ['tma_memory_bound', 'tma_core_bound', 'tma_backend_bound']
> >                         is [6.564442] in workload(s): ['perf bench futex hash -r 2 -s']
> >                         but expected value range is [tma_backend_bound, tma_backend_bound]
> >                         Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
> > Metric Relationship Error:      The collected value of metric ['tma_light_operations', 'tma_heavy_operations', 'tma_retiring']
> >                         is [57.806179] in workload(s): ['perf bench futex hash -r 2 -s']
> >                         but expected value range is [tma_retiring, tma_retiring]
> >                         Relationship rule description: 'Sum of the level 2 children should equal level 1 parent']
> > Metric validation return with erros. Please check metrics reported with errors.
> > ```
> > I suspect it is due to two metrics for different CPU types being
> > enabled. Add a -cputype option to avoid this. The test still fails with:
> > ```
> > Total Test Count:  115
> > Passed Test Count:  114
> > [
> > Wrong Metric Value Error:       The collected value of metric ['tma_l2_hit_latency']
> >                         is [117.947088] in workload(s): ['perf bench futex hash -r 2 -s']
> >                         but expected value range is [0, 100]]
> > Metric validation return with errors. Please check metrics reported with errors.
> > ```
> > which is a reproducible genuine error and likely requires a metric fix.
>
> Hi Ian, I tested this on my alder lake and an arrow lake. All tests, including tma_l2_hit_latency,
> pass on my end.
>
> Tested-by: Thomas Falcon <thomas.falcon@...el.com>

Thanks Thomas! It should also work for core_lowpower on ArrowLake. I
find some times that tma_l2_hit_latency passes for me. Trying a few
more times I see other failures, but they all seem to be "No Metric
Value Error" - perhaps these shouldn't fail the test. In the testing
code we're passing '-a' for system wide profiling:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/lib/perf_metric_validation.py?h=perf-tools-next#n380
I believe this is done so that counters for things like AVX will
gather values. I wonder if the tma_l2_hit_latency something is
happening due to scaling counts:
```
$ sudo /tmp/perf/perf stat -M tma_l2_hit_latency -a sleep 1

 Performance counter stats for 'system wide':

     7,987,903,325      cpu_core/TOPDOWN.SLOTS/          #    210.2 %
tma_l2_hit_latency       (87.27%)
     3,131,119,398      cpu_core/topdown-retiring/
                         (87.27%)
     1,910,718,811      cpu_core/topdown-mem-bound/
                         (87.27%)
       481,456,610      cpu_core/topdown-bad-spec/
                         (87.27%)
     1,681,347,944      cpu_core/topdown-fe-bound/
                         (87.27%)
     2,798,109,902      cpu_core/topdown-be-bound/
                         (87.27%)
       365,736,554      cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/
                                  (87.27%)
       327,668,588      cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/
                                 (87.30%)
        12,744,464      cpu_core/MEM_LOAD_RETIRED.L1_MISS/
                           (75.32%)
     1,403,250,041      cpu_core/CPU_CLK_UNHALTED.THREAD/
                          (87.65%)
         6,657,480      cpu_core/MEM_LOAD_RETIRED.L2_HIT/
                          (87.66%)
    59,424,499,192      TSC
        40,830,608      cpu_core/MEM_LOAD_RETIRED.FB_HIT/
                          (62.46%)
     1,461,544,380      cpu_core/CPU_CLK_UNHALTED.REF_TSC/
                           (74.79%)
     1,008,604,319      duration_time

       1.004974560 seconds time elapsed
```
The values in the parentheses is a scaling amount which should mean
for event multiplexing but for hybrid the events aren't running when
on the other core type, so we're seeing these odd multiplexing values
and these are used to scale counts:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evsel.c?h=perf-tools-next#n599
I find when I run a benchmark rather than "sleep" the issue seems
harder to reproduce.

Thanks,
Ian

> Thanks,
> Tom
> >
> > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > ---
> >  .../tests/shell/lib/perf_metric_validation.py   | 12 +++++++++---
> >  tools/perf/tests/shell/stat_metrics_values.sh   | 17 +++++++++++------
> >  2 files changed, 20 insertions(+), 9 deletions(-)
> >
> > diff --git a/tools/perf/tests/shell/lib/perf_metric_validation.py b/tools/perf/tests/shell/lib/perf_metric_validation.py
> > index 0b94216c9c46..dea8ef1977bf 100644
> > --- a/tools/perf/tests/shell/lib/perf_metric_validation.py
> > +++ b/tools/perf/tests/shell/lib/perf_metric_validation.py
> > @@ -35,7 +35,8 @@ class TestError:
> >
> >
> >  class Validator:
> > -    def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='', workload='true', metrics=''):
> > +    def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='',
> > +                 workload='true', metrics='', cputype='cpu'):
> >          self.rulefname = rulefname
> >          self.reportfname = reportfname
> >          self.rules = None
> > @@ -43,6 +44,7 @@ class Validator:
> >          self.metrics = self.__set_metrics(metrics)
> >          self.skiplist = set()
> >          self.tolerance = t
> > +        self.cputype = cputype
> >
> >          self.workloads = [x for x in workload.split(",") if x]
> >          self.wlidx = 0  # idx of current workloads
> > @@ -377,7 +379,7 @@ class Validator:
> >
> >      def _run_perf(self, metric, workload: str):
> >          tool = 'perf'
> > -        command = [tool, 'stat', '-j', '-M', f"{metric}", "-a"]
> > +        command = [tool, 'stat', '--cputype', self.cputype, '-j', '-M', f"{metric}", "-a"]
> >          wl = workload.split()
> >          command.extend(wl)
> >          print(" ".join(command))
> > @@ -443,6 +445,8 @@ class Validator:
> >                  if 'MetricName' not in m:
> >                      print("Warning: no metric name")
> >                      continue
> > +                if 'Unit' in m and m['Unit'] != self.cputype:
> > +                    continue
> >                  name = m['MetricName'].lower()
> >                  self.metrics.add(name)
> >                  if 'ScaleUnit' in m and (m['ScaleUnit'] == '1%' or m['ScaleUnit'] == '100%'):
> > @@ -578,6 +582,8 @@ def main() -> None:
> >      parser.add_argument(
> >          "-wl", help="Workload to run while data collection", default="true")
> >      parser.add_argument("-m", help="Metric list to validate", default="")
> > +    parser.add_argument("-cputype", help="Only test metrics for the given CPU/PMU type",
> > +                        default="cpu")
> >      args = parser.parse_args()
> >      outpath = Path(args.output_dir)
> >      reportf = Path.joinpath(outpath, 'perf_report.json')
> > @@ -586,7 +592,7 @@ def main() -> None:
> >
> >      validator = Validator(args.rule, reportf, debug=args.debug,
> >                            datafname=datafile, fullrulefname=fullrule, workload=args.wl,
> > -                          metrics=args.m)
> > +                          metrics=args.m, cputype=args.cputype)
> >      ret = validator.test()
> >
> >      return ret
> > diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh
> > index 279f19c5919a..30566f0b5427 100755
> > --- a/tools/perf/tests/shell/stat_metrics_values.sh
> > +++ b/tools/perf/tests/shell/stat_metrics_values.sh
> > @@ -16,11 +16,16 @@ workload="perf bench futex hash -r 2 -s"
> >  # Add -debug, save data file and full rule file
> >  echo "Launch python validation script $pythonvalidator"
> >  echo "Output will be stored in: $tmpdir"
> > -$PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}"
> > -ret=$?
> > -rm -rf $tmpdir
> > -if [ $ret -ne 0 ]; then
> > -     echo "Metric validation return with erros. Please check metrics reported with errors."
> > -fi
> > +for cputype in /sys/bus/event_source/devices/cpu_*; do
> > +     cputype=$(basename "$cputype")
> > +     echo "Testing metrics for: $cputype"
> > +     $PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}" \
> > +             -cputype "${cputype}"
> > +     ret=$?
> > +     rm -rf $tmpdir
> > +     if [ $ret -ne 0 ]; then
> > +             echo "Metric validation return with errors. Please check metrics reported with errors."
> > +     fi
> > +done
> >  exit $ret
> >
>