[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <268afbc2-4e65-0f31-a023-aff4823dd8e8@amd.com>
Date: Mon, 19 Jun 2023 17:16:27 +0530
From: Sandipan Das <sandipan.das@....com>
To: Ian Rogers <irogers@...gle.com>
Cc: linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
peterz@...radead.org, mingo@...hat.com, acme@...nel.org,
mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...nel.org, namhyung@...nel.org, adrian.hunter@...el.com,
kjain@...ux.ibm.com, atrajeev@...ux.vnet.ibm.com,
barnali@...ux.ibm.com, ayush.jain3@....com, ananth.narayan@....com,
ravi.bangoria@....com, santosh.shukla@....com
Subject: Re: [PATCH] perf test: Retry without grouping for all metrics test
Hi Ian,
On 6/14/2023 10:10 PM, Ian Rogers wrote:
> On Wed, Jun 14, 2023 at 2:07 AM Sandipan Das <sandipan.das@....com> wrote:
>>
>> There are cases where a metric uses more events than the number of
>> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
>> counters but the "nps1_die_to_dram" metric has eight events. By default,
>> the constituent events are placed in a group. Since the events cannot be
>> scheduled at the same time, the metric is not computed. The all metrics
>> test also fails because of this.
>
> Thanks Sandipan. So this is exposing a bug in the AMD data fabric PMU
> driver. When the events are added the driver should create a fake PMU,
> check that adding the group is valid and if not fail. The failure is
> picked up by the tool and it will remove the group.
>
> I appreciate the need for a time machine to make such a fix work. To
> workaround the issue with the metrics add:
> "MetricConstraint": "NO_GROUP_EVENTS",
> to each metric in the json.
>
Thanks for the suggestions. The amd_uncore driver is indeed missing group
validation checks during event init. Will send out a fix with the
"NO_GROUP_EVENTS" workaround.
>> Before announcing failure, the test can try multiple options for each
>> available metric. After system-wide mode fails, retry once again with
>> the "--metric-no-group" option.
>>
>> E.g.
>>
>> $ sudo perf test -v 100
>>
>> Before:
>>
>> 100: perf all metrics test :
>> --- start ---
>> test child forked, pid 672731
>> Testing branch_misprediction_ratio
>> Testing all_remote_links_outbound
>> Testing nps1_die_to_dram
>> Metric 'nps1_die_to_dram' not printed in:
>> Error:
>> Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
>
> This error doesn't relate to grouping, so I'm confused about having it
> in the commit message, aside from the test failure.
>
Agreed. That's the error message from the last attempt where the test
tries to use a longer running workload (perf bench).
- Sandipan
Powered by blists - more mailing lists