lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca45ff5e-3d1a-4149-8efe-7615dd3581ff@linux.intel.com>
Date: Wed, 5 Feb 2025 10:46:56 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Ian Rogers <irogers@...gle.com>, "Falcon, Thomas"
 <thomas.falcon@...el.com>
Cc: "Baker, Edward" <edward.baker@...el.com>,
 "alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>,
 "Biggers, Caleb" <caleb.biggers@...el.com>,
 "mpetlan@...hat.com" <mpetlan@...hat.com>,
 "Taylor, Perry" <perry.taylor@...el.com>,
 "Hunter, Adrian" <adrian.hunter@...el.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "mingo@...hat.com" <mingo@...hat.com>,
 "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
 "manivannan.sadhasivam@...aro.org" <manivannan.sadhasivam@...aro.org>,
 "peterz@...radead.org" <peterz@...radead.org>,
 "Alt, Samantha" <samantha.alt@...el.com>,
 "mark.rutland@....com" <mark.rutland@....com>,
 "Wang, Weilin" <weilin.wang@...el.com>, "acme@...nel.org" <acme@...nel.org>,
 "afaerber@...e.de" <afaerber@...e.de>, "jolsa@...nel.org"
 <jolsa@...nel.org>, "namhyung@...nel.org" <namhyung@...nel.org>
Subject: Re: [PATCH v4 00/23] Intel vendor events and TMA 5.01 metrics



On 2025-02-04 11:58 p.m., Ian Rogers wrote:
> On Tue, Feb 4, 2025 at 8:28 PM Falcon, Thomas <thomas.falcon@...el.com> wrote:
>>
>> On Tue, 2025-02-04 at 13:35 -0800, Ian Rogers wrote:
>>> On Tue, Feb 4, 2025 at 1:33 PM Ian Rogers <irogers@...gle.com> wrote:
>>>>
>>>> Update the Intel vendor events to the latest.
>>>> Update the metrics to TMA 5.01.
>>>> Add Arrowlake and Clearwaterforest support.
>>>> Add metrics for LNL and GNR.
>>>> Address IIO uncore issue spotted on EMR, GRR, GNR, SPR and SRF.
>>>>
>>>> The perf json was generated using the script:
>>>> https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
>>>> with the generated json being in:
>>>> https://github.com/intel/perfmon/tree/main/scripts/perf
>>>>
>>>> Thanks to Perry Taylor <perry.taylor@...el.com>, Caleb Biggers
>>>> <caleb.biggers@...el.com>, Edward Baker <edward.baker@...el.com>
>>>> and
>>>> Weilin Wang <weilin.wang@...el.com> for helping get this patch
>>>> series
>>>> together.
>>>>
>>>> v4: Fix TSC events on hybrid mistakenly specifying the core PMU
>>>>     inhibiting the use of the msr PMU.
>>>> v3: Fixes for hybrid metrics that were missing PMU. Update to the
>>>>     latest events.
>>>> v2: Fix hybrid and Co-authored-by tag issues reported by
>>>>     Arnaldo. Updates to Lunarlake and Meteorlake events. Addition
>>>> of
>>>>     Clearwaterforest.
>>>
>>> Sorry, forgot to add Thomas again.
>>> https://lore.kernel.org/lkml/20250204213259.127939-1-irogers@google.com/
>>
>> Hi, I'm seeing some warnings like this and the all metrics test is
>> skipped:
>>
>> Testing tma_info_inst_mix_iparith
>> FP issues
>> Cannot resolve IDs for tma_info_inst_mix_iparith:
>> cpu_core@...T_RETIRED.ANY@ / (cpu_core@...ARITH_INST_RETIRED.SCALAR@ +
>> cpu_core@...ARITH_INST_RETIRED.VECTOR@)
>> Testing tma_info_inst_mix_iparith_avx128
>> FP issues
>> Cannot resolve IDs for tma_info_inst_mix_iparith_avx128:
>> cpu_core@...T_RETIRED.ANY@ /
>> (cpu_core@...ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ +
>> cpu_core@...ARITH_INST_RETIRED.128B_PACKED_SINGLE@)
>> Testing tma_info_inst_mix_iparith_avx256
>> FP issues
>> Cannot resolve IDs for tma_info_inst_mix_iparith_avx256:
>> cpu_core@...T_RETIRED.ANY@ /
>> (cpu_core@...ARITH_INST_RETIRED.256B_PACKED_DOUBLE@ +
>> cpu_core@...ARITH_INST_RETIRED.256B_PACKED_SINGLE@)
>> Testing tma_info_inst_mix_iparith_scalar_dp
>> FP issues
>> Cannot resolve IDs for tma_info_inst_mix_iparith_scalar_dp:
>> cpu_core@...T_RETIRED.ANY@ /
>> cpu_core@...ARITH_INST_RETIRED.SCALAR_DOUBLE@
>> Testing tma_info_inst_mix_iparith_scalar_sp
>> FP issues
>> Cannot resolve IDs for tma_info_inst_mix_iparith_scalar_sp:
>> cpu_core@...T_RETIRED.ANY@ /
>> cpu_core@...ARITH_INST_RETIRED.SCALAR_SINGLE@
> 
> Thanks Tom, we've gone from a fail to skip - so progress! I think it
> actually isn't something to worry about. These metrics are measuring
> vector and floating point things. We run a workload, when testing the
> metrics, that doesn't have floating point and vector operations. This
> causes issues with metrics for these instructions as the counters
> don't count anything. Because of this I added some logic to just skip
> when we see these failures:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/stat_all_metrics.sh?h=perf-tools-next#n51
> but a better fix would be to have a workload with FP and AMX operations.
> 
> You could test these metrics work manually, by running something like:
> $ perf stat -M tma_info_inst_mix_iparith <benchmark>
> where <benchmark> would need to contain FP or AMX instructions.
> 

It should be OK to skip the "FP issues", but the "Cannot resolve IDs"
seems a different issue.

I found the similar error when I run perf stat on my Arrow Lake machine.

$ sudo ./perf stat
Cannot resolve IDs for tma_memory_bound: topdown\-mem\-bound /
(topdown\-fe\-bound + topdown\-bad\-spec + topdown\-retiring +
topdown\-be\-bound) + 0 * slots

I think the warning is because perf doesn't find all the matched events.
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/metricgroup.c?h=tmp.perf-tools-next#n326


Then I go to check the event list of tma_memory_bound.
For cpu_atom, it requires slots and topdown-mem-bound, which should be
only available on p-core.

+    {
+        "BriefDescription": "This metric represents fraction of slots
the Memory subsystem within the Backend was a bottleneck",
+        "DefaultMetricgroupName": "TopdownL2",
+        "MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound +
topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 *
slots",
+        "MetricGroup":
"Backend;Default;Slots;TmaL2;TopdownL2;tma_L2_group;tma_backend_bound_group",
+        "MetricName": "tma_memory_bound",
+        "MetricThreshold": "tma_memory_bound > 0.2 & tma_backend_bound
> 0.2",
+        "MetricgroupNoGroup": "TopdownL2;Default",
+        "PublicDescription": "This metric represents fraction of slots
the Memory subsystem within the Backend was a bottleneck.  Memory Bound
estimates fraction of slots where pipeline is likely stalled due to
demand load or store instructions. This accounts mainly for (1)
non-completed in-flight memory demand loads which coincides with
execution units starvation; in addition to (2) cases where stores could
impose backpressure on the pipeline when many of them get buffered at
the same time (less common out of the two)",
+        "ScaleUnit": "100%",
+        "Unit": "cpu_atom"
+    },

I didn't check the tma_info_inst_mix_iparith which Thomas mentioned
above yet. But I suspect it should be the same issue.

The perf test may have to error out when the "Cannot resolve IDs"
message is detected.

Thanks,
Kan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ