[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fW_2iWEyOKao8MpMZWu7AQNX6-UKN1nEhr=mMxk0fUJKg@mail.gmail.com>
Date: Wed, 6 Dec 2023 10:50:39 -0800
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Ayush Jain <ayush.jain3@....com>,
Sandipan Das <sandipan.das@....com>,
linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
peterz@...radead.org, Ingo Molnar <mingo@...nel.org>,
mark.rutland@....com, alexander.shishkin@...ux.intel.com,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, kjain@...ux.ibm.com,
atrajeev@...ux.vnet.ibm.com, barnali@...ux.ibm.com,
ananth.narayan@....com, ravi.bangoria@....com,
santosh.shukla@....com
Subject: Re: [PATCH] perf test: Retry without grouping for all metrics test
On Wed, Dec 6, 2023 at 9:54 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
>
> Em Wed, Dec 06, 2023 at 08:35:23AM -0800, Ian Rogers escreveu:
> > On Wed, Dec 6, 2023 at 5:08 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
> > > Humm, I'm not being able to reproduce here the problem, before applying
> > > this patch:
>
> > Please don't apply the patch. The patch masks a bug in metrics/PMUs
>
> I didn't
>
> > and the proper fix was:
> > 8d40f74ebf21 perf vendor events amd: Fix large metrics
> > https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com
>
> that is upstream:
>
> ⬢[acme@...lbox perf-tools-next]$ git log tools/perf/pmu-events/arch/x86/amdzen1/recommended.json
> commit 8d40f74ebf217d3b9e9b7481721e6236b857cc55
> Author: Sandipan Das <sandipan.das@....com>
> Date: Thu Jul 6 12:04:40 2023 +0530
>
> perf vendor events amd: Fix large metrics
>
> There are cases where a metric requires more events than the number of
> available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four
> data fabric counters but the "nps1_die_to_dram" metric has eight events.
>
> By default, the constituent events are placed in a group and since the
> events cannot be scheduled at the same time, the metric is not computed.
> The "all metrics" test also fails because of this.
>
> Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect
> the user to run perf with "--metric-no-group".
>
> E.g.
>
> $ sudo perf test -v 101
>
> Before:
>
> 101: perf all metrics test :
> --- start ---
> test child forked, pid 37131
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Metric 'nps1_die_to_dram' not printed in:
> Error:
> Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with -1
> ---- end ----
> perf all metrics test: FAILED!
>
> After:
>
> 101: perf all metrics test :
> --- start ---
> test child forked, pid 43766
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with 0
> ---- end ----
> perf all metrics test: Ok
>
> Reported-by: Ayush Jain <ayush.jain3@....com>
> Suggested-by: Ian Rogers <irogers@...gle.com>
> Signed-off-by: Sandipan Das <sandipan.das@....com>
> Acked-by: Ian Rogers <irogers@...gle.com>
> Cc: Adrian Hunter <adrian.hunter@...el.com>
> Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
> Cc: Ananth Narayan <ananth.narayan@....com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Jiri Olsa <jolsa@...nel.org>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Ravi Bangoria <ravi.bangoria@....com>
> Cc: Santosh Shukla <santosh.shukla@....com>
> Link: https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@amd.com
> Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com
>
> > > Ian, I also stumbled on this:
>
> > > [root@...e ~]# perf stat -M dram_channel_data_controller_4
> > > Cannot find metric or group `dram_channel_data_controller_4'
> > > ^C
> > > Performance counter stats for 'system wide':
>
> > > 284,908.91 msec cpu-clock # 32.002 CPUs utilized
> > > 6,485,456 context-switches # 22.763 K/sec
> > > 719 cpu-migrations # 2.524 /sec
> > > 32,800 page-faults # 115.125 /sec
>
> <SNIP>
>
> > > I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?
>
> > We could. I suspect the code has always just not bailed out. I'll put
> > together a patch adding the bail out.
>
> Great, thanks,
Sent:
https://lore.kernel.org/lkml/20231206183533.972028-1-irogers@google.com/
Thanks,
Ian
> - Arnaldo
Powered by blists - more mailing lists