linux-kernel - Re: [PATCH v4 07/22] perf jevents: Add br metric group for branch statistics on Intel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fVwdaACaYCG8Jn-w+wHDNugcon8OiN9-mOAcXisVV5N0w@mail.gmail.com>
Date: Thu, 7 Nov 2024 09:19:43 -0800
From: Ian Rogers <irogers@...gle.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Perry Taylor <perry.taylor@...el.com>, 
	Samantha Alt <samantha.alt@...el.com>, Caleb Biggers <caleb.biggers@...el.com>, 
	Weilin Wang <weilin.wang@...el.com>, Edward Baker <edward.baker@...el.com>
Subject: Re: [PATCH v4 07/22] perf jevents: Add br metric group for branch
 statistics on Intel

On Thu, Nov 7, 2024 at 6:35 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>
>
>
> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> > The br metric group for branches itself comprises metric groups for
> > total, taken, conditional, fused and far metric groups using json
> > events. Conditional taken and not taken metrics are specific to
> > Icelake and later generations, so the presence of the event is used to
> > determine whether the metric should exist.
> > > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > ---
> >  tools/perf/pmu-events/intel_metrics.py | 138 +++++++++++++++++++++++++
> >  1 file changed, 138 insertions(+)
> >
> > diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> > index 58e243695f0a..09f7b7159e7c 100755
> > --- a/tools/perf/pmu-events/intel_metrics.py
> > +++ b/tools/perf/pmu-events/intel_metrics.py
> > @@ -123,6 +123,143 @@ def Tsx() -> Optional[MetricGroup]:
> >    ], description="Breakdown of transactional memory statistics")
> >
> >
> > +def IntelBr():
> > +  ins = Event("instructions")
> > +
> > +  def Total() -> MetricGroup:
> > +    br_all = Event ("BR_INST_RETIRED.ALL_BRANCHES", "BR_INST_RETIRED.ANY")
> > +    br_m_all = Event("BR_MISP_RETIRED.ALL_BRANCHES",
> > +                     "BR_INST_RETIRED.MISPRED",
> > +                     "BR_MISP_EXEC.ANY")
> > +    br_clr = None
> > +    try:
> > +      br_clr = Event("BACLEARS.ANY", "BACLEARS.ALL")
> > +    except:
> > +      pass
>
> There is no guarantee to the event name. It could be changed later.
> The renaming already occurred several times, even for architectural events.
>
> I think we should test all events' presence, not just a few of them.

So the idea with the Event names is that we look for each name in turn
in the json, stopping with the first one we find. If none are found
then an exception is thrown. This means a typo can cause issues so we
also check that the event name exists in some json somewhere:
https://github.com/googleprodkernel/linux-perf/blob/google_tools_master/tools/perf/pmu-events/metric.py#L89
I agree it isn't super perfect, we're dealing with strings, things can
be fragile..

> There could be some effort in the future to sync with the event list for
> each new generation and check if there is a renaming.

So the worst case is that a metric isn't present in a new generation,
then we update the script for a new event name and things will be
fixed. I think testing can keep this in check.

Thanks,
Ian

> Thanks,
> Kan
> > +
> > +    br_r = d_ratio(br_all, interval_sec)
> > +    ins_r = d_ratio(ins, br_all)
> > +    misp_r = d_ratio(br_m_all, br_all)
> > +    clr_r = d_ratio(br_clr, interval_sec) if br_clr else None
> > +
> > +    return MetricGroup("br_total", [
> > +        Metric("br_total_retired",
> > +               "The number of branch instructions retired per second.", br_r,
> > +               "insn/s"),
> > +        Metric(
> > +            "br_total_mispred",
> > +            "The number of branch instructions retired, of any type, that were "
> > +            "not correctly predicted as a percentage of all branch instrucions.",
> > +            misp_r, "100%"),
> > +        Metric("br_total_insn_between_branches",
> > +               "The number of instructions divided by the number of branches.",
> > +               ins_r, "insn"),
> > +        Metric("br_total_insn_fe_resteers",
> > +               "The number of resync branches per second.", clr_r, "req/s"
> > +               ) if clr_r else None
> > +    ])
> > +
> > +  def Taken() -> MetricGroup:
> > +    br_all = Event("BR_INST_RETIRED.ALL_BRANCHES", "BR_INST_RETIRED.ANY")
> > +    br_m_tk = None
> > +    try:
> > +      br_m_tk = Event("BR_MISP_RETIRED.NEAR_TAKEN",
> > +                      "BR_MISP_RETIRED.TAKEN_JCC",
> > +                      "BR_INST_RETIRED.MISPRED_TAKEN")
> > +    except:
> > +      pass
> > +    br_r = d_ratio(br_all, interval_sec)
> > +    ins_r = d_ratio(ins, br_all)
> > +    misp_r = d_ratio(br_m_tk, br_all) if br_m_tk else None
> > +    return MetricGroup("br_taken", [
> > +        Metric("br_taken_retired",
> > +               "The number of taken branches that were retired per second.",
> > +               br_r, "insn/s"),
> > +        Metric(
> > +            "br_taken_mispred",
> > +            "The number of retired taken branch instructions that were "
> > +            "mispredicted as a percentage of all taken branches.", misp_r,
> > +            "100%") if misp_r else None,
> > +        Metric(
> > +            "br_taken_insn_between_branches",
> > +            "The number of instructions divided by the number of taken branches.",
> > +            ins_r, "insn"),
> > +    ])
> > +
> > +  def Conditional() -> Optional[MetricGroup]:
> > +    try:
> > +      br_cond = Event("BR_INST_RETIRED.COND",
> > +                      "BR_INST_RETIRED.CONDITIONAL",
> > +                      "BR_INST_RETIRED.TAKEN_JCC")
> > +      br_m_cond = Event("BR_MISP_RETIRED.COND",
> > +                        "BR_MISP_RETIRED.CONDITIONAL",
> > +                        "BR_MISP_RETIRED.TAKEN_JCC")
> > +    except:
> > +      return None
> > +
> > +    br_cond_nt = None
> > +    br_m_cond_nt = None
> > +    try:
> > +      br_cond_nt = Event("BR_INST_RETIRED.COND_NTAKEN")
> > +      br_m_cond_nt = Event("BR_MISP_RETIRED.COND_NTAKEN")
> > +    except:
> > +      pass
> > +    br_r = d_ratio(br_cond, interval_sec)
> > +    ins_r = d_ratio(ins, br_cond)
> > +    misp_r = d_ratio(br_m_cond, br_cond)
> > +    taken_metrics = [
> > +        Metric("br_cond_retired", "Retired conditional branch instructions.",
> > +               br_r, "insn/s"),
> > +        Metric("br_cond_insn_between_branches",
> > +               "The number of instructions divided by the number of conditional "
> > +               "branches.", ins_r, "insn"),
> > +        Metric("br_cond_mispred",
> > +               "Retired conditional branch instructions mispredicted as a "
> > +               "percentage of all conditional branches.", misp_r, "100%"),
> > +    ]
> > +    if not br_m_cond_nt:
> > +      return MetricGroup("br_cond", taken_metrics)
> > +
> > +    br_r = d_ratio(br_cond_nt, interval_sec)
> > +    ins_r = d_ratio(ins, br_cond_nt)
> > +    misp_r = d_ratio(br_m_cond_nt, br_cond_nt)
> > +
> > +    not_taken_metrics = [
> > +        Metric("br_cond_retired", "Retired conditional not taken branch instructions.",
> > +               br_r, "insn/s"),
> > +        Metric("br_cond_insn_between_branches",
> > +               "The number of instructions divided by the number of not taken conditional "
> > +               "branches.", ins_r, "insn"),
> > +        Metric("br_cond_mispred",
> > +               "Retired not taken conditional branch instructions mispredicted as a "
> > +               "percentage of all not taken conditional branches.", misp_r, "100%"),
> > +    ]
> > +    return MetricGroup("br_cond", [
> > +        MetricGroup("br_cond_nt", not_taken_metrics),
> > +        MetricGroup("br_cond_tkn", taken_metrics),
> > +    ])
> > +
> > +  def Far() -> Optional[MetricGroup]:
> > +    try:
> > +      br_far = Event("BR_INST_RETIRED.FAR_BRANCH")
> > +    except:
> > +      return None
> > +
> > +    br_r = d_ratio(br_far, interval_sec)
> > +    ins_r = d_ratio(ins, br_far)
> > +    return MetricGroup("br_far", [
> > +        Metric("br_far_retired", "Retired far control transfers per second.",
> > +               br_r, "insn/s"),
> > +        Metric(
> > +            "br_far_insn_between_branches",
> > +            "The number of instructions divided by the number of far branches.",
> > +            ins_r, "insn"),
> > +    ])
> > +
> > +  return MetricGroup("br", [Total(), Taken(), Conditional(), Far()],
> > +                     description="breakdown of retired branch instructions")
> > +
> > +
> >  def main() -> None:
> >    global _args
> >
> > @@ -150,6 +287,7 @@ def main() -> None:
> >        Rapl(),
> >        Smi(),
> >        Tsx(),
> > +      IntelBr(),
> >    ])
> >
> >
>