[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c48a6f46-5991-40dc-abac-f66f2706c84e@linux.intel.com>
Date: Thu, 7 Nov 2024 14:36:23 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Ian Rogers <irogers@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
Perry Taylor <perry.taylor@...el.com>, Samantha Alt
<samantha.alt@...el.com>, Caleb Biggers <caleb.biggers@...el.com>,
Weilin Wang <weilin.wang@...el.com>, Edward Baker <edward.baker@...el.com>
Subject: Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving
utilization on Intel
On 2024-11-07 12:12 p.m., Ian Rogers wrote:
> On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>>
>>
>> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
>>> The ports metric group contains a metric for each port giving its
>>> utilization as a ratio of cycles. The metrics are created by looking
>>> for UOPS_DISPATCHED.PORT events.
>>>
>>> Signed-off-by: Ian Rogers <irogers@...gle.com>
>>> ---
>>> tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
>>> 1 file changed, 31 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
>>> index f4707e964f75..3ef4eb868580 100755
>>> --- a/tools/perf/pmu-events/intel_metrics.py
>>> +++ b/tools/perf/pmu-events/intel_metrics.py
>>> @@ -1,12 +1,13 @@
>>> #!/usr/bin/env python3
>>> # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>>> from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
>>> - JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
>>> - MetricGroup, MetricRef, Select)
>>> + JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
>>> + Metric, MetricGroup, MetricRef, Select)
>>> import argparse
>>> import json
>>> import math
>>> import os
>>> +import re
>>> from typing import Optional
>>>
>>> # Global command line arguments.
>>> @@ -260,6 +261,33 @@ def IntelBr():
>>> description="breakdown of retired branch instructions")
>>>
>>>
>>> +def IntelPorts() -> Optional[MetricGroup]:
>>> + pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
>>> +
>>> + core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
>>> + "CPU_CLK_UNHALTED.DISTRIBUTED",
>>> + "cycles")
>>> + # Number of CPU cycles scaled for SMT.
>>> + smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
>>> +
>>> + metrics = []
>>> + for x in pipeline_events:
>>> + if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
>>> + name = x["EventName"]
>>> + port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
>>> + if name.endswith("_CORE"):
>>> + cyc = core_cycles
>>> + else:
>>> + cyc = smt_cycles
>>> + metrics.append(Metric(port, f"{port} utilization (higher is better)",
>>> + d_ratio(Event(name), cyc), "100%"))
>>
>> The generated metric highly depends on the event name, which is very
>> fragile. We will probably have the same event in a new generation, but
>> with a different name. Long-term maintenance could be a problem.
>> Is there an idea regarding how to sync the event names for new generations?
> I agree with the idea that it is fragile, it is also strangely robust
> as you say, new generations will gain support if they follow the same
> naming convention. We have tests that load bearing metrics exists on
> our platforms so maybe the appropriate place to test for existence is
> in Weilin's metrics test.
>
>
>> Maybe we should improve the event generation script and do an automatic
>> check to tell which metrics are missed. Then we may decide if updating
>> the new event name, dropping the metric or adding a different metric.
> So I'm not sure it is a bug to not have the metric, if it were we
> could just throw rather than return None. We're going to run the
> script for every model including old models like nehalem, so I've
> generally kept it as None. I think doing future work on testing is
> probably best. It would also indicate use of the metric if people
> notice it missing (not that the script aims for that 🙂 ).
The maintenance is still a concern, even if we have a way to test it
out. There is already an "official" metric published in GitHub, which is
maintained by Intel. To be honest, I don't think there is more energy to
maintain these "non-official" metrics.
I don't think it should be a bug without these metrics. So it's very
likely that the issue will not be addressed right away. If we cannot
keep these metrics updated for the future platforms, I couldn't find a
reason to have them.
Thanks,
Kan
Powered by blists - more mailing lists