linux-kernel - Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAP-5=fWGGQh_Kwr5mWPQv6RO=o8bk2mmShJ6MjR9i1v42e0Ziw@mail.gmail.com>
Date: Thu, 7 Nov 2024 09:12:23 -0800
From: Ian Rogers <irogers@...gle.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Perry Taylor <perry.taylor@...el.com>, 
	Samantha Alt <samantha.alt@...el.com>, Caleb Biggers <caleb.biggers@...el.com>, 
	Weilin Wang <weilin.wang@...el.com>, Edward Baker <edward.baker@...el.com>
Subject: Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving
 utilization on Intel

On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>
>
>
> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> > The ports metric group contains a metric for each port giving its
> > utilization as a ratio of cycles. The metrics are created by looking
> > for UOPS_DISPATCHED.PORT events.
> >
> > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > ---
> >  tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
> >  1 file changed, 31 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> > index f4707e964f75..3ef4eb868580 100755
> > --- a/tools/perf/pmu-events/intel_metrics.py
> > +++ b/tools/perf/pmu-events/intel_metrics.py
> > @@ -1,12 +1,13 @@
> >  #!/usr/bin/env python3
> >  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> >  from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
> > -                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
> > -                    MetricGroup, MetricRef, Select)
> > +                    JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
> > +                    Metric, MetricGroup, MetricRef, Select)
> >  import argparse
> >  import json
> >  import math
> >  import os
> > +import re
> >  from typing import Optional
> >
> >  # Global command line arguments.
> > @@ -260,6 +261,33 @@ def IntelBr():
> >                       description="breakdown of retired branch instructions")
> >
> >
> > +def IntelPorts() -> Optional[MetricGroup]:
> > +  pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
> > +
> > +  core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
> > +                      "CPU_CLK_UNHALTED.DISTRIBUTED",
> > +                      "cycles")
> > +  # Number of CPU cycles scaled for SMT.
> > +  smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
> > +
> > +  metrics = []
> > +  for x in pipeline_events:
> > +    if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
> > +      name = x["EventName"]
> > +      port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
> > +      if name.endswith("_CORE"):
> > +        cyc = core_cycles
> > +      else:
> > +        cyc = smt_cycles
> > +      metrics.append(Metric(port, f"{port} utilization (higher is better)",
> > +                            d_ratio(Event(name), cyc), "100%"))
>
>
> The generated metric highly depends on the event name, which is very
> fragile. We will probably have the same event in a new generation, but
> with a different name. Long-term maintenance could be a problem.
> Is there an idea regarding how to sync the event names for new generations?

I agree with the idea that it is fragile, it is also strangely robust
as you say, new generations will gain support if they follow the same
naming convention. We have tests that load bearing metrics exists on
our platforms so maybe the appropriate place to test for existence is
in Weilin's metrics test.


> Maybe we should improve the event generation script and do an automatic
> check to tell which metrics are missed. Then we may decide if updating
> the new event name, dropping the metric or adding a different metric.

So I'm not sure it is a bug to not have the metric, if it were we
could just throw rather than return None. We're going to run the
script for every model including old models like nehalem, so I've
generally kept it as None. I think doing future work on testing is
probably best. It would also indicate use of the metric if people
notice it missing (not that the script aims for that :-) ).

Thanks,
Ian

> Thanks,
> Kan
>
> > +  if len(metrics) == 0:
> > +    return None
> > +
> > +  return MetricGroup("ports", metrics, "functional unit (port) utilization -- "
> > +                     "fraction of cycles each port is utilized (higher is better)")
> > +
> > +
> >  def IntelSwpf() -> Optional[MetricGroup]:
> >    ins = Event("instructions")
> >    try:
> > @@ -352,6 +380,7 @@ def main() -> None:
> >        Smi(),
> >        Tsx(),
> >        IntelBr(),
> > +      IntelPorts(),
> >        IntelSwpf(),
> >    ])
> >
>