linux-kernel - Re: [PATCH v2] perf stat: Introduce skippable evsels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fU9WeBga_tQKi6QKX1-t=C1w5CoPc8Rn9myR=1CxeQuMg@mail.gmail.com>
Date:   Wed, 19 Apr 2023 17:23:54 -0700
From:   Ian Rogers <irogers@...gle.com>
To:     "Liang, Kan" <kan.liang@...ux.intel.com>
Cc:     Stephane Eranian <eranian@...gle.com>,
        Andi Kleen <ak@...ux.intel.com>,
        "Yasin, Ahmad" <ahmad.yasin@...el.com>,
        "Taylor, Perry" <perry.taylor@...el.com>,
        "Alt, Samantha" <samantha.alt@...el.com>,
        "Biggers, Caleb" <caleb.biggers@...el.com>,
        "Wang, Weilin" <weilin.wang@...el.com>,
        Edward <edward.baker@...el.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Florian Fischer <florian.fischer@...q.space>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] perf stat: Introduce skippable evsels

On Wed, Apr 19, 2023 at 11:57 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>
>
> On 2023-04-19 12:51 p.m., Ian Rogers wrote:
> >>>>> With that change I don't have a case that requires skippable evsels,
> >>>>> and so we can take that series with patch 6 over the v1 of that series
> >>>>> with this change.
> >>>>>
> >>>>
> >>>> I'm afraid this is not the only problem the commit 94b1a603fca7 ("perf
> >>>> stat: Add TopdownL1 metric as a default if present") in the
> >>>> perf-tools-next branch introduced.
> >>>>
> >>>> The topdown L2 in the perf stat default on SPR and big core of the ADL
> >>>> is still missed. I don't see a possible fix for this on the current
> >>>> perf-tools-next branch.
> >>>
> >>> I thought in its current state the json metrics for TopdownL2 on SPR
> >>> have multiplexing. Given L1 is used to drill down to L2, it seems odd
> >>> to start on L2, but given L1 is used to compute the thresholds for L2,
> >>> this should be to have both L1 and L2 on these platforms. However,
> >>> that doesn't work as you don't want multiplexing.
> >>>
> >>> This all seems backward to avoid potential multiplexing on branch miss
> >>> rate and IPC, just always having TopdownL1 seems cleanest with the
> >>> skippable evsels working around the permissions issue - as put forward
> >>> in this patch. Possibly adding L2 metrics on ADL/SPR, but only once
> >>> the multiplexing issue is resolved.
> >>>
> >>
> >> No, not just that issue. Based to what I tested these days, perf stat
> >> default has issues/regressions on most of the Intel platforms with the
> >> current perf-tools-next and perf/core branch of acme's repo.
> >>
> >> For the pre-ICL platforms:
> >> - The permission issue. (This patch tried to address.)
> >> - Unclean perf stat default. (This patch failed to address.)
> >>   Unnecessary multiplexing for cycles.
> >>   Display partial of the TopdownL1
> >>
> >> https://lore.kernel.org/lkml/d1fe801a-22d0-1f9b-b127-227b21635bd5@linux.intel.com/
> >>
> >> For SPR platforms
> >> - Topdown L2 metrics is missed, while it works with the current 6.3-rc7.
> >>
> >> For ADL/RPL platforms
> >> - Segmentation fault which I just found this morning.
> >> # ./perf stat true
> >> Segmentation fault (core dumped)
> >
> > This may also stem from the reference count checking work that Arnaldo
> > is currently merging. It is hard to test hybrid because it uses
> > non-generic code paths.
>
> There are two places which causes the Segmentation fault.
> One place is the TopdownL1.
>
> After I disable the TopdownL1 and add !counter->name as below, there are
> no errors for the ./perf stat true.
>
> (The below is just for test purpose.)
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 7a641a67486d..8e12ed1141e0 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1892,7 +1892,7 @@ static int add_default_attributes(void)
>                  * Add TopdownL1 metrics if they exist. To minimize
>                  * multiplexing, don't request threshold computation.
>                  */
> -               if (metricgroup__has_metric("TopdownL1")) {
> +               if (0 && metricgroup__has_metric("TopdownL1")) {

So hybrid has something different that causes this. Can you provide
the information to solve?

>                         struct evlist *metric_evlist = evlist__new();
>                         struct evsel *metric_evsel;
>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 6b46bbb3d322..072fa56744b4 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -747,7 +747,7 @@ static void uniquify_event_name(struct evsel *counter)
>         int ret = 0;
>
>         if (counter->uniquified_name || counter->use_config_name ||
> -           !counter->pmu_name || !strncmp(counter->name, counter->pmu_name,
> +           !counter->pmu_name || !counter->name ||
> !strncmp(counter->name, counter->pmu_name,
>                                            strlen(counter->pmu_name)))
>                 return;

Is this a pre-existing hybrid bug? It is a real shame hybrid shows so
few common code paths. In general evsel__name should be preferred over
directly accessing name.

> >
> >> After the test on a hybrid machine, I incline to revert the commit
> >> 94b1a603fca7 ("perf stat: Add TopdownL1 metric as a default if present")
> >> and related patches for now.
> >>
> >> To clarify, I do not object a generic solution for the Topdown on
> >> different ARCHs. But the current generic solution aka TopdownL1 has all
> >> kinds of problems on most of Intel platforms. We should fix them first
> >> before applying to the mainline.
> >
> > No, 6.3 has many issues as do the default events/metrics:
>
> To be honest, I don't think they are real critical issues. For the first
> one, I think there was already a temporary fix. For the others, they are
> there for years.

This isn't true. The aggregation bug was raised to me by Intel and
stems from aggregation refactoring in per-thread mode done by
Namhyung.

> However, the solution you proposed in the huge patch set
> (https://lore.kernel.org/lkml/20230219092848.639226-37-irogers@google.com/)
> brings many critical issues on different Intel platforms, crashes,
> Missing features, etc.
> Also, I was just told that many of our existing tools which on top of
> the perf tool will also be broken, because all the annotations of the
> kernel top-down metrics event disappeared.
>
> So we really should revert the patches. I don't think patches 39 to 51
> are well-tested and reviewed.

The only issue I'm aware of is that hard coded use of the inaccurate
hard coded metrics now needs to switch to json metrics. This seems a
worthwhile update anyway, and not one that would justify breaking perf
stat metrics.

Thanks,
Ian

> Thanks,
> Kan
>
> >  - in 6.3 aggregation is wrong for metrics, specifically double
> > counting happens as the summary gets applied to the saved value under
> > certain conditions like repeat runs. This is solved in perf-tools-next
> > by removing the duplicated aggregation and using a single set of
> > counters. The "shadow" code is both simpler and more correct than in
> > 6.3, reverting it would be a massive regression - tbh, if it wasn't
> > for the size of the patches I think we should cherry-pick what is in
> > perf-tools-next back to 6.3 due to the size of the errors.
> >  - the default events for IPC and branch miss rate lack groups and so
> > are inaccurate when multiplexing happens - but multiplexing inaccuracy
> > has historically not been an issue for the default events/metrics.
> >  - the previous default topdown metrics were inaccurate compared to
> > the TMA metrics spreadsheet, specifically they lacked events to
> > correct for errors and the thresholds differed. This is why TopdownL2
> > is a challenge because the json metrics do things properly causing the
> > event groups not to be trivially shared.
> >  - the topdown event solution is not generic, it is not only Intel
> > specific but Intel model specific. Solving the problem via the json is
> > the most consistent way to do this and is what I suggest we move to
> > in:
> > https://lore.kernel.org/lkml/20230419005423.343862-6-irogers@google.com/
> > but you can keep pushing that perf should do something special for
> > Intel and for every Intel model. It also pushes a need into the metric
> > code to do hybrid specific things so we can ignore atom TopdownL1. In
> > this vein, hybrid is a giant mess that the code base needs cleaning up
> > from; take the recent patch to just disable generic event parsing
> > tests for MTL:
> > https://lore.kernel.org/lkml/20230411094330.653965-1-tinghao.zhang@intel.com/
> > We need to stop doing this, and have generic code, not least because
> > of complaints like:
> > https://www.kernel.org/doc/html/latest/gpu/i915.html#issues-hit-with-first-prototype-based-on-core-perf
> > and because of the crash you are reporting on ADL/RPL.
> >
> > What has motivated you here is a concern over multiplexing on branch
> > miss rate and IPC on pre-Icelake architectures that I find hard to
> > understand and is trivially remedied in time honored fashions, ie
> > profile the metrics you care about separately. A far more egregious
> > multiplexing problem is that on Icelake+ architectures we multiplex
> > fixed and generic counters unnecessarily when the same fixed counter
> > is needed or there are insufficient generic counters. This is
> > egregious as there is no user workaround and the kernel fix, to
> > iterate all hardware events the same way we do for all software
> > events, is both obvious and trivial. We've lived with this
> > multiplexing problem since Icelake and I think we can live with a
> > possible multiplexing problem, on legacy architectures, with what is
> > in perf-tools-next. The permissions problem tackled here means we
> > should merge this.
> >
> > Thanks,
> > Ian
> >
> >> Thanks,
> >> Kan