[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <145ce38d-67c5-47e5-9625-0ae9e9831fd9@linux.intel.com>
Date: Thu, 6 Feb 2025 13:53:41 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Ian Rogers <irogers@...gle.com>
Cc: Thomas Falcon <thomas.falcon@...el.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
Andreas Färber <afaerber@...e.de>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
Weilin Wang <weilin.wang@...el.com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, Perry Taylor <perry.taylor@...el.com>,
Samantha Alt <samantha.alt@...el.com>,
Caleb Biggers <caleb.biggers@...el.com>,
Edward Baker <edward.baker@...el.com>, Michael Petlan <mpetlan@...hat.com>
Subject: Re: [PATCH v5 11/24] perf vendor events: Update/add Graniterapids
events/metrics
On 2025-02-06 12:36 p.m., Ian Rogers wrote:
> On Thu, Feb 6, 2025 at 9:11 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>
>>
>>
>> On 2025-02-06 11:40 a.m., Ian Rogers wrote:
>>> On Thu, Feb 6, 2025 at 6:32 AM Liang, Kan <kan.liang@...ux.intel.com>
>> wrote:
>>>>
>>>> On 2025-02-05 4:33 p.m., Ian Rogers wrote:
>>>>> On Wed, Feb 5, 2025 at 1:10 PM Liang, Kan <kan.liang@...ux.intel.com>
>> wrote:
>>>>>>
>>>>>> On 2025-02-05 3:23 p.m., Ian Rogers wrote:
>>>>>>> On Wed, Feb 5, 2025 at 11:11 AM Liang, Kan <
>> kan.liang@...ux.intel.com> wrote:
>>>>>>>>
>>>>>>>> On 2025-02-05 12:31 p.m., Ian Rogers wrote:
>>>>>>>>> + {
>>>>>>>>> + "BriefDescription": "This category represents fraction of
>> slots utilized by useful work i.e. issued uops that eventually get retired",
>>>>>>>>> + "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound
>> + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 *
>> slots",
>>>>>>>>> + "MetricGroup": "BvUW;TmaL1;TopdownL1;tma_L1_group",
>>>>>>>>> + "MetricName": "tma_retiring",
>>>>>>>>> + "MetricThreshold": "tma_retiring > 0.7 |
>> tma_heavy_operations > 0.1",
>>>>>>>>> + "MetricgroupNoGroup": "TopdownL1",
>>>>>>>>> + "PublicDescription": "This category represents fraction
>> of slots utilized by useful work i.e. issued uops that eventually get
>> retired. Ideally; all pipeline slots would be attributed to the Retiring
>> category. Retiring of 100% would indicate the maximum Pipeline_Width
>> throughput was achieved. Maximizing Retiring typically increases the
>> Instructions-per-cycle (see IPC metric). Note that a high Retiring value
>> does not necessary mean there is no room for more performance. For
>> example; Heavy-operations or Microcode Assists are categorized under
>> Retiring. They often indicate suboptimal performance and can often be
>> optimized or avoided. Sample with: UOPS_RETIRED.SLOTS",
>>>>>>>>> + "ScaleUnit": "100%"
>>>>>>>>> + },
>>>>>>>>
>>>>>>>> The "Default" tag is missed for GNR as well.
>>>>>>>> It seems the new CPUIDs are not added in the script?
>>>>>>>
>>>>>>> Spotted it, we need to manually say which architectures with
>> TopdownL1
>>>>>>> should be in Default because it was insisted upon that pre-Icelake
>>>>>>> CPUs with TopdownL1 not have TopdownL1 in Default. As you know, my
>>>>>>> preference would be to always put TopdownL1 metrics into Default.
>>>>>>>
>>>>>>
>>>>>> For the future platforms, there should be always at least TopdownL1
>>>>>> support. Intel even adds extra fixed counters for the TopdownL1
>> events.
>>>>>>
>>>>>> Maybe the script should be changed to only mark the old pre-Icelake as
>>>>>> no TopdownL1 Default. For the other platforms, always add TopdownL1 as
>>>>>> Default. It would avoid manually adding it for every new platforms.
>>>>>
>>>>> That's fair. What about TopdownL2 that is currently only in the
>>>>> Default set for SPR?
>>>>>
>>>>
>>>> Yes, the TopdownL2 is a bit tricky, which requires much more events.
>>>> Could you please set it just for SPR/EMR/GNR for now?
>>>>
>>>> I will ask around internally and make a long-term solution for the
>>>> TopdownL2.
>>>
>>> Thanks Kan, I've updated the script the existing way for now. Thomas
>>> saw another issue with TSC which is also fixed. I'm trying to
>>> understand what happened with it before sending out v6:
>>>
>> https://lore.kernel.org/lkml/4f42946ffdf474fbf8aeaa142c25a25ebe739b78.camel@intel.com/
>>> """
>>> There are all some errors like this,
>>>
>>> Testing tma_cisc
>>> Metric contains missing events
>>> Cannot resolve IDs for tma_cisc: cpu_atom@...DOWN_FE_BOUND.CISC@ / (5
>>> * cpu_atom@..._CLK_UNHALTED.CORE@)
>>> """
>>> But checking the json I wasn't able to spot a model with the metric
>>> and without these json events. Knowing the model would make my life
>>> easier :-)
>>>
>>
>> The problem should be caused by the fundamental Topdown metrics, e.g.,
>> tma_frontend_bound, since the MetricThreshold of the tma_cisc requires
>> the Topdown metrics.
>>
>> $ ./perf stat -M tma_frontend_bound
>> Cannot resolve IDs for tma_frontend_bound:
>> cpu_atom@...DOWN_FE_BOUND.ALL@ / (8 * cpu_atom@..._CLK_UNHALTED.CORE@)
>>
>>
>> The metric itself is correct.
>>
>> + "BriefDescription": "Counts the number of issue slots that were
>> not consumed by the backend due to frontend stalls.",
>> + "MetricExpr": "cpu_atom@...DOWN_FE_BOUND.ALL@ / (8 *
>> cpu_atom@..._CLK_UNHALTED.CORE@)",
>> + "MetricGroup": "TopdownL1;tma_L1_group",
>> + "MetricName": "tma_frontend_bound",
>> + "MetricThreshold": "(tma_frontend_bound >0.20)",
>> + "MetricgroupNoGroup": "TopdownL1",
>> + "ScaleUnit": "100%",
>> + "Unit": "cpu_atom"
>> + },
>>
>> However, when I dump the debug information,
>> ./perf stat -M tma_frontend_bound -vvv
>>
>> I got below debug information. I have no idea where the slot is from.
>> It seems the perf code mess up the p-core metrics with the e-core
>> metrics. But why only slot?
>> It seems a bug of perf tool.
>>
>> found event cpu_atom@..._CLK_UNHALTED.CORE@
>> found event cpu_atom@...DOWN_FE_BOUND.ALL@
>> found event slots
>> Parsing metric events
>>
>> '{cpu_atom/CPU_CLK_UNHALTED.CORE,metric-id=cpu_atom!3CPU_CLK_UNHALTED.CORE!3/,cpu_atom/TOPDOWN_FE_BOUND.ALL,metric-id=cpu_atom!3TOPDOWN_FE_BOUND.ALL!3/,slots/metric-id=slots/}:W'
It because the perf adds "slot" as a tool event for the e-core Topdown
metrics.
There is no "slot" event for e-core.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/metricgroup.c#n1481
I will check why "slot" event is added as a tool event for e-core?
That doesn't make sense.
Thanks,
Kan
>>
>
> Some more clues for me but still no model name :-)
> If this were in the metric json I'd expect the issue to be here:
> https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py#L1626
> but it appears the PMU in perf is somehow injecting events - I wasn't aware
> this happened but I don't see every change, my memory is also fallible. I'd
> expect the injection if it's happening to be in:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/topdown.c?h=perf-tools-next
> or:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/metricgroup.c?h=perf-tools-next
> and I'm not seeing it. Could you help me to debug as I have no way to
> reproduce? Perhaps set a watch point on the number of entries in the evlist.
>
> Thanks,
> Ian
>
>
>
>>
>>
>> Thanks,
>> Kan
>>
>
Powered by blists - more mailing lists