linux-kernel - Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c48a6f46-5991-40dc-abac-f66f2706c84e@linux.intel.com>
Date: Thu, 7 Nov 2024 14:36:23 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Ian Rogers <irogers@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Arnaldo Carvalho de Melo <acme@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
 linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
 Perry Taylor <perry.taylor@...el.com>, Samantha Alt
 <samantha.alt@...el.com>, Caleb Biggers <caleb.biggers@...el.com>,
 Weilin Wang <weilin.wang@...el.com>, Edward Baker <edward.baker@...el.com>
Subject: Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving
 utilization on Intel



On 2024-11-07 12:12 p.m., Ian Rogers wrote:
> On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>>
>>
>> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
>>> The ports metric group contains a metric for each port giving its
>>> utilization as a ratio of cycles. The metrics are created by looking
>>> for UOPS_DISPATCHED.PORT events.
>>>
>>> Signed-off-by: Ian Rogers <irogers@...gle.com>
>>> ---
>>>  tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
>>>  1 file changed, 31 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
>>> index f4707e964f75..3ef4eb868580 100755
>>> --- a/tools/perf/pmu-events/intel_metrics.py
>>> +++ b/tools/perf/pmu-events/intel_metrics.py
>>> @@ -1,12 +1,13 @@
>>>  #!/usr/bin/env python3
>>>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>>>  from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
>>> -                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
>>> -                    MetricGroup, MetricRef, Select)
>>> +                    JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
>>> +                    Metric, MetricGroup, MetricRef, Select)
>>>  import argparse
>>>  import json
>>>  import math
>>>  import os
>>> +import re
>>>  from typing import Optional
>>>
>>>  # Global command line arguments.
>>> @@ -260,6 +261,33 @@ def IntelBr():
>>>                       description="breakdown of retired branch instructions")
>>>
>>>
>>> +def IntelPorts() -> Optional[MetricGroup]:
>>> +  pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
>>> +
>>> +  core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
>>> +                      "CPU_CLK_UNHALTED.DISTRIBUTED",
>>> +                      "cycles")
>>> +  # Number of CPU cycles scaled for SMT.
>>> +  smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
>>> +
>>> +  metrics = []
>>> +  for x in pipeline_events:
>>> +    if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
>>> +      name = x["EventName"]
>>> +      port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
>>> +      if name.endswith("_CORE"):
>>> +        cyc = core_cycles
>>> +      else:
>>> +        cyc = smt_cycles
>>> +      metrics.append(Metric(port, f"{port} utilization (higher is better)",
>>> +                            d_ratio(Event(name), cyc), "100%"))
>>
>> The generated metric highly depends on the event name, which is very
>> fragile. We will probably have the same event in a new generation, but
>> with a different name. Long-term maintenance could be a problem.
>> Is there an idea regarding how to sync the event names for new generations?
> I agree with the idea that it is fragile, it is also strangely robust
> as you say, new generations will gain support if they follow the same
> naming convention. We have tests that load bearing metrics exists on
> our platforms so maybe the appropriate place to test for existence is
> in Weilin's metrics test.
> 
> 
>> Maybe we should improve the event generation script and do an automatic
>> check to tell which metrics are missed. Then we may decide if updating
>> the new event name, dropping the metric or adding a different metric.
> So I'm not sure it is a bug to not have the metric, if it were we
> could just throw rather than return None. We're going to run the
> script for every model including old models like nehalem, so I've
> generally kept it as None. I think doing future work on testing is
> probably best. It would also indicate use of the metric if people
> notice it missing (not that the script aims for that 🙂 ).

The maintenance is still a concern, even if we have a way to test it
out. There is already an "official" metric published in GitHub, which is
maintained by Intel. To be honest, I don't think there is more energy to
maintain these "non-official" metrics.

I don't think it should be a bug without these metrics. So it's very
likely that the issue will not be addressed right away. If we cannot
keep these metrics updated for the future platforms, I couldn't find a
reason to have them.

Thanks,
Kan