linux-kernel - Re: [PATCH 2/3] perf metricgroup: Fix uncore metric expressions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fVP8p==J-K59Oo6WvZCafnsFAux3g3vW-wLKHqaASyveQ@mail.gmail.com>
Date:   Wed, 9 Sep 2020 20:40:54 -0700
From:   Ian Rogers <irogers@...gle.com>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>, Kajol Jain <kjain@...ux.ibm.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Jin Yao <yao.jin@...ux.intel.com>,
        Hongbo Yao <yaohongbo@...wei.com>,
        Thomas Richter <tmricht@...ux.ibm.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>, Jin@...gle.com
Subject: Re: [PATCH 2/3] perf metricgroup: Fix uncore metric expressions

On Wed, Sep 9, 2020 at 10:52 AM Ian Rogers <irogers@...gle.com> wrote:
>
> On Wed, Sep 9, 2020 at 2:53 AM Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > Hi Ian,
> >
> > On Wed, Sep 9, 2020 at 5:02 PM Ian Rogers <irogers@...gle.com> wrote:
> > >
> > > A metric like DRAM_BW_Use has on SkylakeX events uncore_imc/cas_count_read/
> > > and uncore_imc/case_count_write/. These events open 6 events per socket
> > > with pmu names of uncore_imc_[0-5]. The current metric setup code in
> > > find_evsel_group assumes one ID will map to 1 event to be recorded in
> > > metric_events. For events with multiple matches, the first event is
> > > recorded in metric_events (avoiding matching >1 event with the same
> > > name) and the evlist_used updated so that duplicate events aren't
> > > removed when the evlist has unused events removed.
> > >
> > > Before this change:
> > > $ /tmp/perf/perf stat -M DRAM_BW_Use -a -- sleep 1
> > >
> > >  Performance counter stats for 'system wide':
> > >
> > >              41.14 MiB  uncore_imc/cas_count_read/
> > >      1,002,614,251 ns   duration_time
> > >
> > >        1.002614251 seconds time elapsed
> > >
> > > After this change:
> > > $ /tmp/perf/perf stat -M DRAM_BW_Use -a -- sleep 1
> > >
> > >  Performance counter stats for 'system wide':
> > >
> > >             157.47 MiB  uncore_imc/cas_count_read/ #     0.00 DRAM_BW_Use
> >
> > Hmm.. I guess the 0.00 result is incorrect, no?
>
> Agreed. There are a number of pre-existing bugs in this code. I'll try
> to look into this one.

There was a bug where the metric_leader wasn't being set correctly and
so the accumulation of values wasn't working as expected. There's a
small fix in:
https://lore.kernel.org/lkml/20200910032632.511566-3-irogers@google.com/T/#u
that also does the change I mentioned below.

The metric still remains at 0.0 following this change as I believe
there is a bug in the metric. The metric expression is:
"( 64 * ( uncore_imc@..._count_read@ + uncore_imc@..._count_write@ ) /
1000000000 ) / duration_time"
ie the counts of uncore_imc/cas_count_read/ and
uncore_imc/cas_count_write/ are being first being scaled up by 64,
that is to turn a count of cache lines into bytes, the count is then
divided by 1000000000 or a GB to supposedly give GB/s. However, the
counts are read and scaled:

...
void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
...
  count *= counter->scale;
...

The scale value from
/sys/devices/uncore_imc_0/events/cas_count_read.scale is
6.103515625e-5 or 64/(1024*1024). This means the result of the
expression is 64 times too large but in PB/s and so too small to
display. I don't rule out there being other issues but this appears to
be a significant one. Printing using intervals also looks suspicious
as the count appears to just increase rather than vary up and down.
Jin Yao, I don't know if you can take a look?

Thanks,
Ian

> > >             126.97 MiB  uncore_imc/cas_count_write/
> > >      1,003,019,728 ns   duration_time
> > >
> > > Erroneous duplication introduced in:
> > > commit 2440689d62e9 ("perf metricgroup: Remove duped metric group events").
> > >
> > > Fixes: ded80bda8bc9 ("perf expr: Migrate expr ids table to a hashmap").
> > > Reported-by: Jin, Yao <yao.jin@...ux.intel.com>
> > > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > > ---
> > [SNIP]
> > > @@ -248,6 +260,16 @@ static struct evsel *find_evsel_group(struct evlist *perf_evlist,
> > >                 ev = metric_events[i];
> > >                 ev->metric_leader = ev;
> > >                 set_bit(ev->idx, evlist_used);
> > > +               /*
> > > +                * Mark two events with identical names in the same group as
> > > +                * being in use as uncore events may be duplicated for each pmu.
> > > +                */
> > > +               evlist__for_each_entry(perf_evlist, ev) {
> > > +                       if (metric_events[i]->leader == ev->leader &&
> > > +                           !strcmp(metric_events[i]->name, ev->name)) {
> > > +                               set_bit(ev->idx, evlist_used);
> >
> > I'm not sure whether they are grouped together.
> > But if so, you can use for_each_group_member(ev, leader).
>
> Good suggestion, unfortunately the groups may be removed for things
> like NMI watchdog or --metric-no-group and so there wouldn't be a
> leader to follow. We could do something like this:
>
> if (metric_events[i]->leader) {
>   for_each_group_member(ev, leader) {
>     if (!strcmp(metric_events[i]->name, ev->name))
>       set_bit(ev->idx, evlist_used);
>   }
> } else {
>   evlist__for_each_entry(perf_evlist, ev) {
>     if (!ev->leader && !strcmp(metric_events[i]->name, ev->name))
>       set_bit(ev->idx, evlist_used);
>   }
>  }
> }
>
> What do you think?
>
> Thanks,
> Ian
>
> > Thanks
> > Namhyung
> >
> >
> > > +                       }
> > > +               }
> > >         }
> > >
> > >         return metric_events[0];
> > > --
> > > 2.28.0.526.ge36021eeef-goog
> > >