linux-kernel - Re: [PATCH v1 2/2] perf test: Hybrid improvements for metric value validation test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fXr4bydqx5vXdFkJW99Asa3+Bzhk8AWnQc9_g8pywDaTg@mail.gmail.com>
Date: Tue, 13 May 2025 12:07:24 -0700
From: Ian Rogers <irogers@...gle.com>
To: "Falcon, Thomas" <thomas.falcon@...el.com>, "Wang, Weilin" <weilin.wang@...el.com>, 
	"kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>, Stephane Eranian <eranian@...gle.com>
Cc: "james.clark@...aro.org" <james.clark@...aro.org>, 
	"alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>, 
	"linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"peterz@...radead.org" <peterz@...radead.org>, "mark.rutland@....com" <mark.rutland@....com>, 
	"mingo@...hat.com" <mingo@...hat.com>, "Hunter, Adrian" <adrian.hunter@...el.com>, 
	"acme@...nel.org" <acme@...nel.org>, "namhyung@...nel.org" <namhyung@...nel.org>, 
	"jolsa@...nel.org" <jolsa@...nel.org>
Subject: Re: [PATCH v1 2/2] perf test: Hybrid improvements for metric value
 validation test

On Tue, May 13, 2025 at 10:26 AM Ian Rogers <irogers@...gle.com> wrote:
>
> On Mon, May 12, 2025 at 2:52 PM Ian Rogers <irogers@...gle.com> wrote:
> >
> > On Mon, May 12, 2025 at 1:30 PM Falcon, Thomas <thomas.falcon@...el.com> wrote:
> > >
> > > On Mon, 2025-05-12 at 11:47 -0700, Ian Rogers wrote:
> > > > On my alderlake I currently see for the "perf metrics value validation" test:
> > > > ```
> > > > Total Test Count:  142
> > > > Passed Test Count:  139
> > > > [
> > > > Metric Relationship Error:      The collected value of metric ['tma_fetch_latency', 'tma_fetch_bandwidth', 'tma_frontend_bound']
> > > >                         is [31.137028] in workload(s): ['perf bench futex hash -r 2 -s']
> > > >                         but expected value range is [tma_frontend_bound, tma_frontend_bound]
> > > >                         Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
> > > > Metric Relationship Error:      The collected value of metric ['tma_memory_bound', 'tma_core_bound', 'tma_backend_bound']
> > > >                         is [6.564442] in workload(s): ['perf bench futex hash -r 2 -s']
> > > >                         but expected value range is [tma_backend_bound, tma_backend_bound]
> > > >                         Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
> > > > Metric Relationship Error:      The collected value of metric ['tma_light_operations', 'tma_heavy_operations', 'tma_retiring']
> > > >                         is [57.806179] in workload(s): ['perf bench futex hash -r 2 -s']
> > > >                         but expected value range is [tma_retiring, tma_retiring]
> > > >                         Relationship rule description: 'Sum of the level 2 children should equal level 1 parent']
> > > > Metric validation return with erros. Please check metrics reported with errors.
> > > > ```
> > > > I suspect it is due to two metrics for different CPU types being
> > > > enabled. Add a -cputype option to avoid this. The test still fails with:
> > > > ```
> > > > Total Test Count:  115
> > > > Passed Test Count:  114
> > > > [
> > > > Wrong Metric Value Error:       The collected value of metric ['tma_l2_hit_latency']
> > > >                         is [117.947088] in workload(s): ['perf bench futex hash -r 2 -s']
> > > >                         but expected value range is [0, 100]]
> > > > Metric validation return with errors. Please check metrics reported with errors.
> > > > ```
> > > > which is a reproducible genuine error and likely requires a metric fix.
> > >
> > > Hi Ian, I tested this on my alder lake and an arrow lake. All tests, including tma_l2_hit_latency,
> > > pass on my end.
> > >
> > > Tested-by: Thomas Falcon <thomas.falcon@...el.com>
> >
> > Thanks Thomas! It should also work for core_lowpower on ArrowLake. I
> > find some times that tma_l2_hit_latency passes for me. Trying a few
> > more times I see other failures, but they all seem to be "No Metric
> > Value Error" - perhaps these shouldn't fail the test. In the testing
> > code we're passing '-a' for system wide profiling:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/lib/perf_metric_validation.py?h=perf-tools-next#n380
> > I believe this is done so that counters for things like AVX will
> > gather values. I wonder if the tma_l2_hit_latency something is
> > happening due to scaling counts:
> > ```
> > $ sudo /tmp/perf/perf stat -M tma_l2_hit_latency -a sleep 1
> >
> >  Performance counter stats for 'system wide':
> >
> >      7,987,903,325      cpu_core/TOPDOWN.SLOTS/          #    210.2 %
> > tma_l2_hit_latency       (87.27%)
> >      3,131,119,398      cpu_core/topdown-retiring/
> >                          (87.27%)
> >      1,910,718,811      cpu_core/topdown-mem-bound/
> >                          (87.27%)
> >        481,456,610      cpu_core/topdown-bad-spec/
> >                          (87.27%)
> >      1,681,347,944      cpu_core/topdown-fe-bound/
> >                          (87.27%)
> >      2,798,109,902      cpu_core/topdown-be-bound/
> >                          (87.27%)
> >        365,736,554      cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/
> >                                   (87.27%)
> >        327,668,588      cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/
> >                                  (87.30%)
> >         12,744,464      cpu_core/MEM_LOAD_RETIRED.L1_MISS/
> >                            (75.32%)
> >      1,403,250,041      cpu_core/CPU_CLK_UNHALTED.THREAD/
> >                           (87.65%)
> >          6,657,480      cpu_core/MEM_LOAD_RETIRED.L2_HIT/
> >                           (87.66%)
> >     59,424,499,192      TSC
> >         40,830,608      cpu_core/MEM_LOAD_RETIRED.FB_HIT/
> >                           (62.46%)
> >      1,461,544,380      cpu_core/CPU_CLK_UNHALTED.REF_TSC/
> >                            (74.79%)
> >      1,008,604,319      duration_time
> >
> >        1.004974560 seconds time elapsed
> > ```
> > The values in the parentheses is a scaling amount which should mean
> > for event multiplexing but for hybrid the events aren't running when
> > on the other core type, so we're seeing these odd multiplexing values
> > and these are used to scale counts:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evsel.c?h=perf-tools-next#n599
> > I find when I run a benchmark rather than "sleep" the issue seems
> > harder to reproduce.
>
> Ok chatting with Weilin and actually paying attention to warning
> messages I think I see a problem. TSC (msr/tsc/) is aggregating data
> across all CPUs (it is a software event and is a different performance
> monitoring unit to cpu_core) while the counters are only on cpu_core.
> So I think this means the TSC value is too large. However, even with
> restricting the CPUs I see the >100% problem:
> ```
> $ perf stat -M tma_l2_hit_latency -C 0-15 -a sleep 1
>
>  Performance counter stats for 'system wide':
>
>     27,985,670,146      cpu_core/TOPDOWN.SLOTS/          #    125.6 %
> tma_l2_hit_latency       (87.22%)
>      9,619,906,383      cpu_core/topdown-retiring/
>                          (87.22%)
>      2,333,124,385      cpu_core/topdown-mem-bound/
>                          (87.22%)
>      3,607,656,674      cpu_core/topdown-bad-spec/
>                          (87.22%)
>      9,839,779,867      cpu_core/topdown-fe-bound/
>                          (87.22%)
>      5,244,189,749      cpu_core/topdown-be-bound/
>                          (87.22%)
>        442,932,231      cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/
>                                   (87.24%)
>        360,126,840      cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/
>                                  (87.63%)
>         31,264,814      cpu_core/MEM_LOAD_RETIRED.L1_MISS/
>                            (75.26%)
>      4,761,244,040      cpu_core/CPU_CLK_UNHALTED.THREAD/
>                           (87.63%)
>         28,429,277      cpu_core/MEM_LOAD_RETIRED.L2_HIT/
>                           (87.62%)
>     33,863,490,835      TSC
>         23,533,366      cpu_core/MEM_LOAD_RETIRED.FB_HIT/
>                           (62.25%)
>      3,158,155,632      cpu_core/CPU_CLK_UNHALTED.REF_TSC/
>                            (74.63%)
>      1,003,102,327      duration_time
>
>        1.001912038 seconds time elapsed
> ```
> So we still need to figure this one out. The multiplexing numbers
> still worry me.

So I think the TSC bug is genuine, perhaps Kan has thoughts on how to
restrict the cpu mask to just the core cpus. The tma_l2_hit_latency
>100% bug appears to be global. I see the problem on a tigerlake:
```
$ perf stat --metric-no-threshold -M tma_l2_hit_latency -a sleep 1

Performance counter stats for 'system wide':

       46,745,378      MEM_LOAD_RETIRED.FB_HIT          #    105.8 %
tma_l2_hit_latency
    1,445,788,955      CPU_CLK_UNHALTED.REF_TSC
    2,532,066,403      CPU_CLK_UNHALTED.THREAD
   40,008,507,350      TSC
       11,922,390      MEM_LOAD_RETIRED.L1_MISS
        2,587,517      MEM_LOAD_RETIRED.L2_HIT
    1,002,819,485      duration_time

      1.002198593 seconds time elapsed
```
Anyway, I think this patch series should land and we can worry about
this metric and the hybrid problems separately.

Thanks,
Ian

> >
> > > Thanks,
> > > Tom
> > > >
> > > > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > > > ---
> > > >  .../tests/shell/lib/perf_metric_validation.py   | 12 +++++++++---
> > > >  tools/perf/tests/shell/stat_metrics_values.sh   | 17 +++++++++++------
> > > >  2 files changed, 20 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/tools/perf/tests/shell/lib/perf_metric_validation.py b/tools/perf/tests/shell/lib/perf_metric_validation.py
> > > > index 0b94216c9c46..dea8ef1977bf 100644
> > > > --- a/tools/perf/tests/shell/lib/perf_metric_validation.py
> > > > +++ b/tools/perf/tests/shell/lib/perf_metric_validation.py
> > > > @@ -35,7 +35,8 @@ class TestError:
> > > >
> > > >
> > > >  class Validator:
> > > > -    def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='', workload='true', metrics=''):
> > > > +    def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='',
> > > > +                 workload='true', metrics='', cputype='cpu'):
> > > >          self.rulefname = rulefname
> > > >          self.reportfname = reportfname
> > > >          self.rules = None
> > > > @@ -43,6 +44,7 @@ class Validator:
> > > >          self.metrics = self.__set_metrics(metrics)
> > > >          self.skiplist = set()
> > > >          self.tolerance = t
> > > > +        self.cputype = cputype
> > > >
> > > >          self.workloads = [x for x in workload.split(",") if x]
> > > >          self.wlidx = 0  # idx of current workloads
> > > > @@ -377,7 +379,7 @@ class Validator:
> > > >
> > > >      def _run_perf(self, metric, workload: str):
> > > >          tool = 'perf'
> > > > -        command = [tool, 'stat', '-j', '-M', f"{metric}", "-a"]
> > > > +        command = [tool, 'stat', '--cputype', self.cputype, '-j', '-M', f"{metric}", "-a"]
> > > >          wl = workload.split()
> > > >          command.extend(wl)
> > > >          print(" ".join(command))
> > > > @@ -443,6 +445,8 @@ class Validator:
> > > >                  if 'MetricName' not in m:
> > > >                      print("Warning: no metric name")
> > > >                      continue
> > > > +                if 'Unit' in m and m['Unit'] != self.cputype:
> > > > +                    continue
> > > >                  name = m['MetricName'].lower()
> > > >                  self.metrics.add(name)
> > > >                  if 'ScaleUnit' in m and (m['ScaleUnit'] == '1%' or m['ScaleUnit'] == '100%'):
> > > > @@ -578,6 +582,8 @@ def main() -> None:
> > > >      parser.add_argument(
> > > >          "-wl", help="Workload to run while data collection", default="true")
> > > >      parser.add_argument("-m", help="Metric list to validate", default="")
> > > > +    parser.add_argument("-cputype", help="Only test metrics for the given CPU/PMU type",
> > > > +                        default="cpu")
> > > >      args = parser.parse_args()
> > > >      outpath = Path(args.output_dir)
> > > >      reportf = Path.joinpath(outpath, 'perf_report.json')
> > > > @@ -586,7 +592,7 @@ def main() -> None:
> > > >
> > > >      validator = Validator(args.rule, reportf, debug=args.debug,
> > > >                            datafname=datafile, fullrulefname=fullrule, workload=args.wl,
> > > > -                          metrics=args.m)
> > > > +                          metrics=args.m, cputype=args.cputype)
> > > >      ret = validator.test()
> > > >
> > > >      return ret
> > > > diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh
> > > > index 279f19c5919a..30566f0b5427 100755
> > > > --- a/tools/perf/tests/shell/stat_metrics_values.sh
> > > > +++ b/tools/perf/tests/shell/stat_metrics_values.sh
> > > > @@ -16,11 +16,16 @@ workload="perf bench futex hash -r 2 -s"
> > > >  # Add -debug, save data file and full rule file
> > > >  echo "Launch python validation script $pythonvalidator"
> > > >  echo "Output will be stored in: $tmpdir"
> > > > -$PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}"
> > > > -ret=$?
> > > > -rm -rf $tmpdir
> > > > -if [ $ret -ne 0 ]; then
> > > > -     echo "Metric validation return with erros. Please check metrics reported with errors."
> > > > -fi
> > > > +for cputype in /sys/bus/event_source/devices/cpu_*; do
> > > > +     cputype=$(basename "$cputype")
> > > > +     echo "Testing metrics for: $cputype"
> > > > +     $PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}" \
> > > > +             -cputype "${cputype}"
> > > > +     ret=$?
> > > > +     rm -rf $tmpdir
> > > > +     if [ $ret -ne 0 ]; then
> > > > +             echo "Metric validation return with errors. Please check metrics reported with errors."
> > > > +     fi
> > > > +done
> > > >  exit $ret
> > > >
> > >