lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <268afbc2-4e65-0f31-a023-aff4823dd8e8@amd.com>
Date:   Mon, 19 Jun 2023 17:16:27 +0530
From:   Sandipan Das <sandipan.das@....com>
To:     Ian Rogers <irogers@...gle.com>
Cc:     linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        peterz@...radead.org, mingo@...hat.com, acme@...nel.org,
        mark.rutland@....com, alexander.shishkin@...ux.intel.com,
        jolsa@...nel.org, namhyung@...nel.org, adrian.hunter@...el.com,
        kjain@...ux.ibm.com, atrajeev@...ux.vnet.ibm.com,
        barnali@...ux.ibm.com, ayush.jain3@....com, ananth.narayan@....com,
        ravi.bangoria@....com, santosh.shukla@....com
Subject: Re: [PATCH] perf test: Retry without grouping for all metrics test

Hi Ian,

On 6/14/2023 10:10 PM, Ian Rogers wrote:
> On Wed, Jun 14, 2023 at 2:07 AM Sandipan Das <sandipan.das@....com> wrote:
>>
>> There are cases where a metric uses more events than the number of
>> counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four data fabric
>> counters but the "nps1_die_to_dram" metric has eight events. By default,
>> the constituent events are placed in a group. Since the events cannot be
>> scheduled at the same time, the metric is not computed. The all metrics
>> test also fails because of this.
> 
> Thanks Sandipan. So this is exposing a bug in the AMD data fabric PMU
> driver. When the events are added the driver should create a fake PMU,
> check that adding the group is valid and if not fail. The failure is
> picked up by the tool and it will remove the group.
> 
> I appreciate the need for a time machine to make such a fix work. To
> workaround the issue with the metrics add:
> "MetricConstraint": "NO_GROUP_EVENTS",
> to each metric in the json.
> 

Thanks for the suggestions. The amd_uncore driver is indeed missing group
validation checks during event init. Will send out a fix with the
"NO_GROUP_EVENTS" workaround.

>> Before announcing failure, the test can try multiple options for each
>> available metric. After system-wide mode fails, retry once again with
>> the "--metric-no-group" option.
>>
>> E.g.
>>
>>   $ sudo perf test -v 100
>>
>> Before:
>>
>>   100: perf all metrics test                                           :
>>   --- start ---
>>   test child forked, pid 672731
>>   Testing branch_misprediction_ratio
>>   Testing all_remote_links_outbound
>>   Testing nps1_die_to_dram
>>   Metric 'nps1_die_to_dram' not printed in:
>>   Error:
>>   Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> 
> This error doesn't relate to grouping, so I'm confused about having it
> in the commit message, aside from the test failure.
> 

Agreed. That's the error message from the last attempt where the test
tries to use a longer running workload (perf bench).

- Sandipan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ