[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fVaC662z2nJnTwBHMu-RE+9HkRmvSyNT5qB448+zw6e9Q@mail.gmail.com>
Date: Mon, 12 May 2025 14:52:07 -0700
From: Ian Rogers <irogers@...gle.com>
To: "Falcon, Thomas" <thomas.falcon@...el.com>, "Wang, Weilin" <weilin.wang@...el.com>
Cc: "james.clark@...aro.org" <james.clark@...aro.org>,
"alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>,
"linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"peterz@...radead.org" <peterz@...radead.org>, "mark.rutland@....com" <mark.rutland@....com>,
"mingo@...hat.com" <mingo@...hat.com>, "Hunter, Adrian" <adrian.hunter@...el.com>,
"acme@...nel.org" <acme@...nel.org>, "namhyung@...nel.org" <namhyung@...nel.org>,
"kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>, "jolsa@...nel.org" <jolsa@...nel.org>
Subject: Re: [PATCH v1 2/2] perf test: Hybrid improvements for metric value
validation test
On Mon, May 12, 2025 at 1:30 PM Falcon, Thomas <thomas.falcon@...el.com> wrote:
>
> On Mon, 2025-05-12 at 11:47 -0700, Ian Rogers wrote:
> > On my alderlake I currently see for the "perf metrics value validation" test:
> > ```
> > Total Test Count: 142
> > Passed Test Count: 139
> > [
> > Metric Relationship Error: The collected value of metric ['tma_fetch_latency', 'tma_fetch_bandwidth', 'tma_frontend_bound']
> > is [31.137028] in workload(s): ['perf bench futex hash -r 2 -s']
> > but expected value range is [tma_frontend_bound, tma_frontend_bound]
> > Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
> > Metric Relationship Error: The collected value of metric ['tma_memory_bound', 'tma_core_bound', 'tma_backend_bound']
> > is [6.564442] in workload(s): ['perf bench futex hash -r 2 -s']
> > but expected value range is [tma_backend_bound, tma_backend_bound]
> > Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
> > Metric Relationship Error: The collected value of metric ['tma_light_operations', 'tma_heavy_operations', 'tma_retiring']
> > is [57.806179] in workload(s): ['perf bench futex hash -r 2 -s']
> > but expected value range is [tma_retiring, tma_retiring]
> > Relationship rule description: 'Sum of the level 2 children should equal level 1 parent']
> > Metric validation return with erros. Please check metrics reported with errors.
> > ```
> > I suspect it is due to two metrics for different CPU types being
> > enabled. Add a -cputype option to avoid this. The test still fails with:
> > ```
> > Total Test Count: 115
> > Passed Test Count: 114
> > [
> > Wrong Metric Value Error: The collected value of metric ['tma_l2_hit_latency']
> > is [117.947088] in workload(s): ['perf bench futex hash -r 2 -s']
> > but expected value range is [0, 100]]
> > Metric validation return with errors. Please check metrics reported with errors.
> > ```
> > which is a reproducible genuine error and likely requires a metric fix.
>
> Hi Ian, I tested this on my alder lake and an arrow lake. All tests, including tma_l2_hit_latency,
> pass on my end.
>
> Tested-by: Thomas Falcon <thomas.falcon@...el.com>
Thanks Thomas! It should also work for core_lowpower on ArrowLake. I
find some times that tma_l2_hit_latency passes for me. Trying a few
more times I see other failures, but they all seem to be "No Metric
Value Error" - perhaps these shouldn't fail the test. In the testing
code we're passing '-a' for system wide profiling:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/lib/perf_metric_validation.py?h=perf-tools-next#n380
I believe this is done so that counters for things like AVX will
gather values. I wonder if the tma_l2_hit_latency something is
happening due to scaling counts:
```
$ sudo /tmp/perf/perf stat -M tma_l2_hit_latency -a sleep 1
Performance counter stats for 'system wide':
7,987,903,325 cpu_core/TOPDOWN.SLOTS/ # 210.2 %
tma_l2_hit_latency (87.27%)
3,131,119,398 cpu_core/topdown-retiring/
(87.27%)
1,910,718,811 cpu_core/topdown-mem-bound/
(87.27%)
481,456,610 cpu_core/topdown-bad-spec/
(87.27%)
1,681,347,944 cpu_core/topdown-fe-bound/
(87.27%)
2,798,109,902 cpu_core/topdown-be-bound/
(87.27%)
365,736,554 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/
(87.27%)
327,668,588 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/
(87.30%)
12,744,464 cpu_core/MEM_LOAD_RETIRED.L1_MISS/
(75.32%)
1,403,250,041 cpu_core/CPU_CLK_UNHALTED.THREAD/
(87.65%)
6,657,480 cpu_core/MEM_LOAD_RETIRED.L2_HIT/
(87.66%)
59,424,499,192 TSC
40,830,608 cpu_core/MEM_LOAD_RETIRED.FB_HIT/
(62.46%)
1,461,544,380 cpu_core/CPU_CLK_UNHALTED.REF_TSC/
(74.79%)
1,008,604,319 duration_time
1.004974560 seconds time elapsed
```
The values in the parentheses is a scaling amount which should mean
for event multiplexing but for hybrid the events aren't running when
on the other core type, so we're seeing these odd multiplexing values
and these are used to scale counts:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/lib/perf/evsel.c?h=perf-tools-next#n599
I find when I run a benchmark rather than "sleep" the issue seems
harder to reproduce.
Thanks,
Ian
> Thanks,
> Tom
> >
> > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > ---
> > .../tests/shell/lib/perf_metric_validation.py | 12 +++++++++---
> > tools/perf/tests/shell/stat_metrics_values.sh | 17 +++++++++++------
> > 2 files changed, 20 insertions(+), 9 deletions(-)
> >
> > diff --git a/tools/perf/tests/shell/lib/perf_metric_validation.py b/tools/perf/tests/shell/lib/perf_metric_validation.py
> > index 0b94216c9c46..dea8ef1977bf 100644
> > --- a/tools/perf/tests/shell/lib/perf_metric_validation.py
> > +++ b/tools/perf/tests/shell/lib/perf_metric_validation.py
> > @@ -35,7 +35,8 @@ class TestError:
> >
> >
> > class Validator:
> > - def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='', workload='true', metrics=''):
> > + def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='',
> > + workload='true', metrics='', cputype='cpu'):
> > self.rulefname = rulefname
> > self.reportfname = reportfname
> > self.rules = None
> > @@ -43,6 +44,7 @@ class Validator:
> > self.metrics = self.__set_metrics(metrics)
> > self.skiplist = set()
> > self.tolerance = t
> > + self.cputype = cputype
> >
> > self.workloads = [x for x in workload.split(",") if x]
> > self.wlidx = 0 # idx of current workloads
> > @@ -377,7 +379,7 @@ class Validator:
> >
> > def _run_perf(self, metric, workload: str):
> > tool = 'perf'
> > - command = [tool, 'stat', '-j', '-M', f"{metric}", "-a"]
> > + command = [tool, 'stat', '--cputype', self.cputype, '-j', '-M', f"{metric}", "-a"]
> > wl = workload.split()
> > command.extend(wl)
> > print(" ".join(command))
> > @@ -443,6 +445,8 @@ class Validator:
> > if 'MetricName' not in m:
> > print("Warning: no metric name")
> > continue
> > + if 'Unit' in m and m['Unit'] != self.cputype:
> > + continue
> > name = m['MetricName'].lower()
> > self.metrics.add(name)
> > if 'ScaleUnit' in m and (m['ScaleUnit'] == '1%' or m['ScaleUnit'] == '100%'):
> > @@ -578,6 +582,8 @@ def main() -> None:
> > parser.add_argument(
> > "-wl", help="Workload to run while data collection", default="true")
> > parser.add_argument("-m", help="Metric list to validate", default="")
> > + parser.add_argument("-cputype", help="Only test metrics for the given CPU/PMU type",
> > + default="cpu")
> > args = parser.parse_args()
> > outpath = Path(args.output_dir)
> > reportf = Path.joinpath(outpath, 'perf_report.json')
> > @@ -586,7 +592,7 @@ def main() -> None:
> >
> > validator = Validator(args.rule, reportf, debug=args.debug,
> > datafname=datafile, fullrulefname=fullrule, workload=args.wl,
> > - metrics=args.m)
> > + metrics=args.m, cputype=args.cputype)
> > ret = validator.test()
> >
> > return ret
> > diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh
> > index 279f19c5919a..30566f0b5427 100755
> > --- a/tools/perf/tests/shell/stat_metrics_values.sh
> > +++ b/tools/perf/tests/shell/stat_metrics_values.sh
> > @@ -16,11 +16,16 @@ workload="perf bench futex hash -r 2 -s"
> > # Add -debug, save data file and full rule file
> > echo "Launch python validation script $pythonvalidator"
> > echo "Output will be stored in: $tmpdir"
> > -$PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}"
> > -ret=$?
> > -rm -rf $tmpdir
> > -if [ $ret -ne 0 ]; then
> > - echo "Metric validation return with erros. Please check metrics reported with errors."
> > -fi
> > +for cputype in /sys/bus/event_source/devices/cpu_*; do
> > + cputype=$(basename "$cputype")
> > + echo "Testing metrics for: $cputype"
> > + $PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}" \
> > + -cputype "${cputype}"
> > + ret=$?
> > + rm -rf $tmpdir
> > + if [ $ret -ne 0 ]; then
> > + echo "Metric validation return with errors. Please check metrics reported with errors."
> > + fi
> > +done
> > exit $ret
> >
>
Powered by blists - more mailing lists