[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <108f87fc28453adb52d6b8c2afd8d184e288a83e.camel@intel.com>
Date: Mon, 12 May 2025 20:30:08 +0000
From: "Falcon, Thomas" <thomas.falcon@...el.com>
To: "james.clark@...aro.org" <james.clark@...aro.org>,
"alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>,
"linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"peterz@...radead.org" <peterz@...radead.org>, "mark.rutland@....com"
<mark.rutland@....com>, "mingo@...hat.com" <mingo@...hat.com>, "Hunter,
Adrian" <adrian.hunter@...el.com>, "acme@...nel.org" <acme@...nel.org>,
"namhyung@...nel.org" <namhyung@...nel.org>, "irogers@...gle.com"
<irogers@...gle.com>, "Wang, Weilin" <weilin.wang@...el.com>,
"kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>, "jolsa@...nel.org"
<jolsa@...nel.org>
Subject: Re: [PATCH v1 2/2] perf test: Hybrid improvements for metric value
validation test
On Mon, 2025-05-12 at 11:47 -0700, Ian Rogers wrote:
> On my alderlake I currently see for the "perf metrics value validation" test:
> ```
> Total Test Count: 142
> Passed Test Count: 139
> [
> Metric Relationship Error: The collected value of metric ['tma_fetch_latency', 'tma_fetch_bandwidth', 'tma_frontend_bound']
> is [31.137028] in workload(s): ['perf bench futex hash -r 2 -s']
> but expected value range is [tma_frontend_bound, tma_frontend_bound]
> Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
> Metric Relationship Error: The collected value of metric ['tma_memory_bound', 'tma_core_bound', 'tma_backend_bound']
> is [6.564442] in workload(s): ['perf bench futex hash -r 2 -s']
> but expected value range is [tma_backend_bound, tma_backend_bound]
> Relationship rule description: 'Sum of the level 2 children should equal level 1 parent',
> Metric Relationship Error: The collected value of metric ['tma_light_operations', 'tma_heavy_operations', 'tma_retiring']
> is [57.806179] in workload(s): ['perf bench futex hash -r 2 -s']
> but expected value range is [tma_retiring, tma_retiring]
> Relationship rule description: 'Sum of the level 2 children should equal level 1 parent']
> Metric validation return with erros. Please check metrics reported with errors.
> ```
> I suspect it is due to two metrics for different CPU types being
> enabled. Add a -cputype option to avoid this. The test still fails with:
> ```
> Total Test Count: 115
> Passed Test Count: 114
> [
> Wrong Metric Value Error: The collected value of metric ['tma_l2_hit_latency']
> is [117.947088] in workload(s): ['perf bench futex hash -r 2 -s']
> but expected value range is [0, 100]]
> Metric validation return with errors. Please check metrics reported with errors.
> ```
> which is a reproducible genuine error and likely requires a metric fix.
Hi Ian, I tested this on my alder lake and an arrow lake. All tests, including tma_l2_hit_latency,
pass on my end.
Tested-by: Thomas Falcon <thomas.falcon@...el.com>
Thanks,
Tom
>
> Signed-off-by: Ian Rogers <irogers@...gle.com>
> ---
> .../tests/shell/lib/perf_metric_validation.py | 12 +++++++++---
> tools/perf/tests/shell/stat_metrics_values.sh | 17 +++++++++++------
> 2 files changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/tests/shell/lib/perf_metric_validation.py b/tools/perf/tests/shell/lib/perf_metric_validation.py
> index 0b94216c9c46..dea8ef1977bf 100644
> --- a/tools/perf/tests/shell/lib/perf_metric_validation.py
> +++ b/tools/perf/tests/shell/lib/perf_metric_validation.py
> @@ -35,7 +35,8 @@ class TestError:
>
>
> class Validator:
> - def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='', workload='true', metrics=''):
> + def __init__(self, rulefname, reportfname='', t=5, debug=False, datafname='', fullrulefname='',
> + workload='true', metrics='', cputype='cpu'):
> self.rulefname = rulefname
> self.reportfname = reportfname
> self.rules = None
> @@ -43,6 +44,7 @@ class Validator:
> self.metrics = self.__set_metrics(metrics)
> self.skiplist = set()
> self.tolerance = t
> + self.cputype = cputype
>
> self.workloads = [x for x in workload.split(",") if x]
> self.wlidx = 0 # idx of current workloads
> @@ -377,7 +379,7 @@ class Validator:
>
> def _run_perf(self, metric, workload: str):
> tool = 'perf'
> - command = [tool, 'stat', '-j', '-M', f"{metric}", "-a"]
> + command = [tool, 'stat', '--cputype', self.cputype, '-j', '-M', f"{metric}", "-a"]
> wl = workload.split()
> command.extend(wl)
> print(" ".join(command))
> @@ -443,6 +445,8 @@ class Validator:
> if 'MetricName' not in m:
> print("Warning: no metric name")
> continue
> + if 'Unit' in m and m['Unit'] != self.cputype:
> + continue
> name = m['MetricName'].lower()
> self.metrics.add(name)
> if 'ScaleUnit' in m and (m['ScaleUnit'] == '1%' or m['ScaleUnit'] == '100%'):
> @@ -578,6 +582,8 @@ def main() -> None:
> parser.add_argument(
> "-wl", help="Workload to run while data collection", default="true")
> parser.add_argument("-m", help="Metric list to validate", default="")
> + parser.add_argument("-cputype", help="Only test metrics for the given CPU/PMU type",
> + default="cpu")
> args = parser.parse_args()
> outpath = Path(args.output_dir)
> reportf = Path.joinpath(outpath, 'perf_report.json')
> @@ -586,7 +592,7 @@ def main() -> None:
>
> validator = Validator(args.rule, reportf, debug=args.debug,
> datafname=datafile, fullrulefname=fullrule, workload=args.wl,
> - metrics=args.m)
> + metrics=args.m, cputype=args.cputype)
> ret = validator.test()
>
> return ret
> diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh
> index 279f19c5919a..30566f0b5427 100755
> --- a/tools/perf/tests/shell/stat_metrics_values.sh
> +++ b/tools/perf/tests/shell/stat_metrics_values.sh
> @@ -16,11 +16,16 @@ workload="perf bench futex hash -r 2 -s"
> # Add -debug, save data file and full rule file
> echo "Launch python validation script $pythonvalidator"
> echo "Output will be stored in: $tmpdir"
> -$PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}"
> -ret=$?
> -rm -rf $tmpdir
> -if [ $ret -ne 0 ]; then
> - echo "Metric validation return with erros. Please check metrics reported with errors."
> -fi
> +for cputype in /sys/bus/event_source/devices/cpu_*; do
> + cputype=$(basename "$cputype")
> + echo "Testing metrics for: $cputype"
> + $PYTHON $pythonvalidator -rule $rulefile -output_dir $tmpdir -wl "${workload}" \
> + -cputype "${cputype}"
> + ret=$?
> + rm -rf $tmpdir
> + if [ $ret -ne 0 ]; then
> + echo "Metric validation return with errors. Please check metrics reported with errors."
> + fi
> +done
> exit $ret
>
Powered by blists - more mailing lists