linux-kernel - Re: [PATCH v2 02/10] perf stat: Don't remove all grouped events when CPU maps disagree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 3 Mar 2023 12:36:48 -0500
From:   "Liang, Kan" <kan.liang@...ux.intel.com>
To:     Ian Rogers <irogers@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
        Ravi Bangoria <ravi.bangoria@....com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        "Steinar H. Gunderson" <sesse@...gle.com>,
        Kim Phillips <kim.phillips@....com>,
        Florian Fischer <florian.fischer@...q.space>,
        James Clark <james.clark@....com>,
        Suzuki Poulouse <suzuki.poulose@....com>,
        Sean Christopherson <seanjc@...gle.com>,
        Leo Yan <leo.yan@...aro.org>,
        John Garry <john.g.garry@...cle.com>,
        Kajol Jain <kjain@...ux.ibm.com>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH v2 02/10] perf stat: Don't remove all grouped events when
 CPU maps disagree



On 2023-03-03 11:44 a.m., Ian Rogers wrote:
> On Fri, Mar 3, 2023 at 7:50 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>>
>>
>>
>> On 2023-03-02 4:25 p.m., Ian Rogers wrote:
>>> If the events in an evlist's CPU map differ then the entire group is
>>> removed. For example:
>>>
>>> ```
>>> $ perf stat -e '{imc_free_running/data_read/,imc_free_running/data_write/,cs}' -a sleep 1
>>> WARNING: grouped events cpus do not match, disabling group:
>>>   anon group { imc_free_running/data_read/, imc_free_running/data_write/, cs }
>>> ```
>>>
>>> Change the behavior so that just the events not matching the leader
>>> are removed. So in the example above, just 'cs' will be removed.
>>>
>>> Modify the warning so that it is produced once for each group, rather
>>> than once for the entire evlist. Shrink the scope and size of the
>>> warning text buffer.
>>
>> For the uncore, we usually have to create a group for each uncore PMU.
>> The number of groups may be big. For example, on ICX, we have 40 CHA
>> PMUs. For SPR, there should be more CHAs. If we have something like
>> {cycles,uncore_cha/event=0x1/}, is the warning shown 40 times on ICX?
>> If so, it should be very annoying.
>>
>> Maybe it's better to keep the current behavior which only print a
>> warning once and notify the users that perf will re-group the events.
>> For the details, they can get it from the -v option.
> 
> Thanks Kan, I could imagine that but I was also worried about cases
> where there are multiple groups like:
> 
> ```
> $ perf stat -e '{imc_free_running/data_read/,cs},{uncore_clock/clockticks/,cs}'
> -a sleep 1
> WARNING: grouped events cpus do not match.
> Events with CPUs not matching the leader will be removed from the group.
>  anon group { imc_free_running/data_read/, cs }
> WARNING: grouped events cpus do not match.
> Events with CPUs not matching the leader will be removed from the group.
>  anon group { uncore_clock/clockticks/, cs }
> 
> Performance counter stats for 'system wide':
> 
>          1,255.75 MiB  imc_free_running/data_read/
>             7,571      cs
>     1,327,285,527      uncore_clock/clockticks/
>             7,571      cs
> 
>       1.002772882 seconds time elapsed
> ```
> 
> Knowing that both groups were broken there feels like a value add.
> Given that this is a warning, and it can be fixed by moving the event
> out of the group or forcing the CPUs, I lean toward being
> informative/spammy as the spam is somewhat straightforwardly fixed on
> the command line.


I did some tests with the patch. The issue I was worried about didn't
occur. The change looks good to me.

But I found another issue. If I specify a CPU set, the group removal
fails. It's not an issue of this patch. It looks like the current perf
compares the user defined CPU set, rather than the PMU's cpumask.

./perf stat -e '{cycles,uncore_cha/event=0x1/}' -C0 sleep 1

 Performance counter stats for 'CPU(s) 0':

     <not counted>      cycles
   <not supported>      uncore_cha/event=0x1/

       1.001783936 seconds time elapsed

Some events weren't counted. Try disabling the NMI watchdog:
        echo 0 > /proc/sys/kernel/nmi_watchdog
        perf stat ...
        echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try
reorganizing the group.

Thanks,
Kan

> 
> Thanks,
> Ian
> 
>> Thanks,
>> Kan
>>>
>>> Signed-off-by: Ian Rogers <irogers@...gle.com>
>>> ---
>>>  tools/perf/builtin-stat.c | 24 +++++++++++++++---------
>>>  1 file changed, 15 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>> index d70b1ec88594..5c12ae5efce5 100644
>>> --- a/tools/perf/builtin-stat.c
>>> +++ b/tools/perf/builtin-stat.c
>>> @@ -181,14 +181,13 @@ static bool cpus_map_matched(struct evsel *a, struct evsel *b)
>>>
>>>  static void evlist__check_cpu_maps(struct evlist *evlist)
>>>  {
>>> -     struct evsel *evsel, *pos, *leader;
>>> -     char buf[1024];
>>> +     struct evsel *evsel, *warned_leader = NULL;
>>>
>>>       if (evlist__has_hybrid(evlist))
>>>               evlist__warn_hybrid_group(evlist);
>>>
>>>       evlist__for_each_entry(evlist, evsel) {
>>> -             leader = evsel__leader(evsel);
>>> +             struct evsel *leader = evsel__leader(evsel);
>>>
>>>               /* Check that leader matches cpus with each member. */
>>>               if (leader == evsel)
>>> @@ -197,19 +196,26 @@ static void evlist__check_cpu_maps(struct evlist *evlist)
>>>                       continue;
>>>
>>>               /* If there's mismatch disable the group and warn user. */
>>> -             WARN_ONCE(1, "WARNING: grouped events cpus do not match, disabling group:\n");
>>> -             evsel__group_desc(leader, buf, sizeof(buf));
>>> -             pr_warning("  %s\n", buf);
>>> -
>>> +             if (warned_leader != leader) {
>>> +                     char buf[200];
>>> +
>>> +                     pr_warning("WARNING: grouped events cpus do not match.\n"
>>> +                             "Events with CPUs not matching the leader will "
>>> +                             "be removed from the group.\n");
>>> +                     evsel__group_desc(leader, buf, sizeof(buf));
>>> +                     pr_warning("  %s\n", buf);
>>> +                     warned_leader = leader;
>>> +             }
>>>               if (verbose > 0) {
>>> +                     char buf[200];
>>> +
>>>                       cpu_map__snprint(leader->core.cpus, buf, sizeof(buf));
>>>                       pr_warning("     %s: %s\n", leader->name, buf);
>>>                       cpu_map__snprint(evsel->core.cpus, buf, sizeof(buf));
>>>                       pr_warning("     %s: %s\n", evsel->name, buf);
>>>               }
>>>
>>> -             for_each_group_evsel(pos, leader)
>>> -                     evsel__remove_from_group(pos, leader);
>>> +             evsel__remove_from_group(evsel, leader);
>>>       }
>>>  }
>>>