linux-kernel - Re: [PATCH v1 02/20] perf jevents: Add idle metric for Intel models

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fWpXHXd8Dd39o_KEcVaBkQKk=aXjYSVTWCitaY6Xm-T4A@mail.gmail.com>
Date: Fri, 1 Mar 2024 10:17:03 -0800
From: Ian Rogers <irogers@...gle.com>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Perry Taylor <perry.taylor@...el.com>, Samantha Alt <samantha.alt@...el.com>, 
	Caleb Biggers <caleb.biggers@...el.com>, Weilin Wang <weilin.wang@...el.com>, 
	Edward Baker <edward.baker@...el.com>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, John Garry <john.g.garry@...cle.com>, 
	Kan Liang <kan.liang@...ux.intel.com>, Jing Zhang <renyu.zj@...ux.alibaba.com>, 
	Thomas Richter <tmricht@...ux.ibm.com>, James Clark <james.clark@....com>, 
	linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org, 
	Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH v1 02/20] perf jevents: Add idle metric for Intel models

On Fri, Mar 1, 2024 at 9:49 AM Andi Kleen <ak@...ux.intel.com> wrote:
>
> > +def Idle() -> Metric:
> > +  cyc = Event("msr/mperf/")
> > +  tsc = Event("msr/tsc/")
> > +  low = max(tsc - cyc, 0)
> > +  return Metric(
> > +      "idle",
> > +      "Percentage of total wallclock cycles where CPUs are in low power state (C1 or deeper sleep state)",
> > +      d_ratio(low, tsc), "100%")
>
> TBH I fail to see the advantage over the JSON. That's much more verbose
> and we don't expect to have really complex metrics anyways.

Are you saying this is more verbose or the json? Here is an example of
a json metric:

https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json?h=perf-tools-next#n652
```
    {
        "BriefDescription": "Probability of Core Bound bottleneck
hidden by SMT-profiling artifacts",
        "MetricExpr": "(100 * (1 - tma_core_bound /
(((EXE_ACTIVITY.EXE_BOUND_0_PORTS + tma_core_bound *
RS_EVENTS.EMPTY_CYCLES) / CPU_CLK_UNHALTED.THREAD *
(CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY) /
CPU_CLK_UNHALTED.THREAD * CPU_CLK_UNHALTED.THREAD +
(EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring *
EXE_ACTIVITY.2_PORTS_UTIL)) / CPU_CLK_UNHALTED.THREAD if
ARITH.DIVIDER_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL -
CYCLE_ACTIVITY.STALLS_MEM_ANY else (EXE_ACTIVITY.1_PORTS_UTIL +
tma_retiring * EXE_ACTIVITY.2_PORTS_UTIL) / CPU_CLK_UNHALTED.THREAD)
if tma_core_bound < (((EXE_ACTIVITY.EXE_BOUND_0_PORTS + tma_core_bound
* RS_EVENTS.EMPTY_CYCLES) / CPU_CLK_UNHALTED.THREAD *
(CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY) /
CPU_CLK_UNHALTED.THREAD * CPU_CLK_UNHALTED.THREAD +
(EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring *
EXE_ACTIVITY.2_PORTS_UTIL)) / CPU_CLK_UNHALTED.THREAD if
ARITH.DIVIDER_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL -
CYCLE_ACTIVITY.STALLS_MEM_ANY else (EXE_ACTIVITY.1_PORTS_UTIL +
tma_retiring * EXE_ACTIVITY.2_PORTS_UTIL) / CPU_CLK_UNHALTED.THREAD)
else 1) if tma_info_system_smt_2t_utilization > 0.5 else 0)",
        "MetricGroup": "Cor;SMT;TopdownL1;tma_L1_group",
        "MetricName": "tma_info_botlnk_core_bound_likely",
        "MetricgroupNoGroup": "TopdownL1"
    },
```

Even with common metrics like tma_core_bound, tma_retiring and
tma_info_system_smt_2t_utilization replacing sections of the metric, I
think anyone has to admit the expression is pretty unintelligible
because of its size/verbosity. To understand the metric would at a
first step involve adding newlines. Comments would be nice, etc.

> And then we have a gigantic patch kit for what gain?

I see some of the gains as:
 - metrics that are human intelligible,
 - metrics for models that are no longer being updated,
 - removing copy-paste of metrics like tsx and smi across each model's
metric json (less lines-of-code),
 - validation of events in a metric expression being in the event json
for a model,
 - removal of forward porting metrics to a new model if the event
names of the new model line up with those of previous,
 - in this patch kit there are metrics added that don't currently
exist (more metrics should be better for users - yes there can always
be bugs).

I also hope the metric grouping is clearer, etc, etc.

> The motivation was the lack of comments in JSON? We could just add some
> to the parser (e.g. with /* */ ).  And we could allow an JSON array for the
> expression to get multiple lines.

Imo, a non-json variant of json would just be taking on a tech debt
burden for something that is inferior to this approach and a wasted
effort. We already generate the json from other more intelligible
sources using python - so I don't find the approach controversial:
https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py

The goal here has been to make a bunch of inhouse metrics public. It
also gives a foundation for vendors and other concerned people to add
metrics in a concise, with documentation and safe (broken events cause
compile time failures) way. There are some similar things like common
events/metrics on ARM:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/arm64/arm/cmn/sys?h=perf-tools-next
but this lacks the structure, validation, documentation, etc. that's
present here so my preference would be for more of the common things
to be done in the python way started here.

Thanks,
Ian

> -Andi