[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fV9Fx99QmKWSqqDK23vF0dcTS+g-r-9zr6q0A2ZXWmCBw@mail.gmail.com>
Date: Wed, 14 Jun 2023 09:40:05 -0700
From: Ian Rogers <irogers@...gle.com>
To: Sandipan Das <sandipan.das@....com>
Cc: linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
peterz@...radead.org, mingo@...hat.com, acme@...nel.org,
mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...nel.org, namhyung@...nel.org, adrian.hunter@...el.com,
kjain@...ux.ibm.com, atrajeev@...ux.vnet.ibm.com,
barnali@...ux.ibm.com, ayush.jain3@....com, ananth.narayan@....com,
ravi.bangoria@....com, santosh.shukla@....com
Subject: Re: [PATCH] perf test: Retry without grouping for all metrics test
On Wed, Jun 14, 2023 at 2:07 AM Sandipan Das <sandipan.das@....com> wrote:
>
> There are cases where a metric uses more events than the number of
> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
> counters but the "nps1_die_to_dram" metric has eight events. By default,
> the constituent events are placed in a group. Since the events cannot be
> scheduled at the same time, the metric is not computed. The all metrics
> test also fails because of this.
Thanks Sandipan. So this is exposing a bug in the AMD data fabric PMU
driver. When the events are added the driver should create a fake PMU,
check that adding the group is valid and if not fail. The failure is
picked up by the tool and it will remove the group.
I appreciate the need for a time machine to make such a fix work. To
workaround the issue with the metrics add:
"MetricConstraint": "NO_GROUP_EVENTS",
to each metric in the json.
> Before announcing failure, the test can try multiple options for each
> available metric. After system-wide mode fails, retry once again with
> the "--metric-no-group" option.
>
> E.g.
>
> $ sudo perf test -v 100
>
> Before:
>
> 100: perf all metrics test :
> --- start ---
> test child forked, pid 672731
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Metric 'nps1_die_to_dram' not printed in:
> Error:
> Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
This error doesn't relate to grouping, so I'm confused about having it
in the commit message, aside from the test failure.
Thanks,
Ian
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with -1
> ---- end ----
> perf all metrics test: FAILED!
>
> After:
>
> 100: perf all metrics test :
> --- start ---
> test child forked, pid 672887
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with 0
> ---- end ----
> perf all metrics test: Ok
>
> Reported-by: Ayush Jain <ayush.jain3@....com>
> Signed-off-by: Sandipan Das <sandipan.das@....com>
> ---
> tools/perf/tests/shell/stat_all_metrics.sh | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh
> index 54774525e18a..1e88ea8c5677 100755
> --- a/tools/perf/tests/shell/stat_all_metrics.sh
> +++ b/tools/perf/tests/shell/stat_all_metrics.sh
> @@ -16,6 +16,13 @@ for m in $(perf list --raw-dump metrics); do
> then
> continue
> fi
> + # Failed again, possibly there are not enough counters so retry system wide
> + # mode but without event grouping.
> + result=$(perf stat -M "$m" --metric-no-group -a sleep 0.01 2>&1)
> + if [[ "$result" =~ ${m:0:50} ]]
> + then
> + continue
> + fi
> # Failed again, possibly the workload was too small so retry with something
> # longer.
> result=$(perf stat -M "$m" perf bench internals synthesize 2>&1)
> --
> 2.34.1
>
Powered by blists - more mailing lists