linux-kernel - Re: [PATCH] perf parse-events: Avoid two scenarios involving the reordering of topdown events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e7fc4fa1-01fd-45ac-bce5-e8a03107363a@intel.com>
Date: Fri, 12 Apr 2024 18:22:40 +0800
From: Yanfei Xu <yanfei.xu@...el.com>
To: Ian Rogers <irogers@...gle.com>
CC: <peterz@...radead.org>, <mingo@...hat.com>, <acme@...nel.org>,
	<mark.rutland@....com>, <alexander.shishkin@...ux.intel.com>,
	<jolsa@...nel.org>, <namhyung@...nel.org>, <adrian.hunter@...el.com>,
	<kan.liang@...ux.intel.com>, <linux-perf-users@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] perf parse-events: Avoid two scenarios involving the
 reordering of topdown events

Hi Ian,

On 4/11/2024 11:14 PM, Ian Rogers wrote:
> On Thu, Apr 11, 2024 at 7:53 AM Yanfei Xu <yanfei.xu@...el.com> wrote:
>>
>> We found that even an events group with slots but without topdown events
>> will still reroder to place the slots first. It's unnecessary, and may
>> break tools expecting the command line order to match the printed order.
>> The issue was previously fixed [1], but was later discarded since [2].
> 
> The requirements for topdown events is a gift that keeps on giving :-( Please:
> 1) provide the tool that is broken,
> 2) provide a reason why the tool is depending on text output that may
> vary over time (intended audience for it is humans) instead of say the
> json output that is intended for consumption by tools,
> 3) provide a test.

We noticed this issue from the regroup WARNNING which didn't trigger in
past regular testing. After digging into the code change I noticed that
there was also a fix for this kind of unnecessary reorder with saying
tools may expecting the original command line order. From my side, I
don't have the tool.
You can refer to the link[1]. Actually it was from you. :)

[1] 
https://lore.kernel.org/lkml/20220321223344.1034479-1-irogers@google.com/#/

> 
>> Add an extra check of evsel leader, variable must_be_in_group, to ensure
>> the slots event is only moved if the group has non-slots topdown events.
>>
>> Without the patch:
>>
>>    $ perf stat  -e '{cpu/cpu-cycles/,slots}'  sleep 1
>>    WARNING: events were regrouped to match PMUs
>>
>>     Performance counter stats for 'sleep 1':
>>
>>             2,663,256      slots:u
>>               443,876      cpu/cpu-cycles/u
>>
>>           1.001079566 seconds time elapsed
>>
>>           0.001054000 seconds user
>>           0.000000000 seconds sys
>>
>> With the patch:
>>
>>    $ perf stat  -e '{cpu/cpu-cycles/,slots}'  sleep 1
>>
>>     Performance counter stats for 'sleep 1':
>>
>>              469,039      cpu/cpu-cycles/u
>>            2,814,234      slots:u
>>
>>          1.001148306 seconds time elapsed
>>
>>          0.001123000 seconds user
>>          0.000000000 seconds sys
>>
>> In cases where both slots and topdown events are present, moving the
>> slots event to be the first event is necessary. However there is no
>> requirement to move the topdown events to be adjacent to slots event.
>> So keep the original order of the topdown events is expected. Further
>> more, if a group doesn't have slots event, the topdown events will be
>> unexpectedly moved to the head of the group.
>>
>> Remove the movements regarding topdown events in arch_evlist__cmp()
>>
>> Without the patch:
>>
>>    $ perf stat -e '{slots,cpu/cpu-cycles/,cpu/topdown-bad-spec/}' sleep 1
>>    WARNING: events were regrouped to match PMUs
>>
>>     Performance counter stats for 'sleep 1':
>>
>>            2,681,460      slots:u
>>              336,496      cpu/topdown-bad-spec/u
>>              446,910      cpu/cpu-cycles/u
>>
>>          1.001210088 seconds time elapsed
>>
>>          0.001160000 seconds user
>>          0.000000000 seconds sys
>>
>> With the patch:
>>
>>    $ perf stat -e '{slots,cpu/cpu-cycles/,cpu/topdown-bad-spec/}' sleep 1
>>
>>     Performance counter stats for 'sleep 1':
>>
>>            2,715,486      slots:u
>>              452,581      cpu/cpu-cycles/u
>>              340,766      cpu/topdown-bad-spec/u
>>
>>          1.001116709 seconds time elapsed
>>
>>          0.001111000 seconds user
>>          0.000000000 seconds sys
>>
>> [1] https://lore.kernel.org/lkml/20220321223344.1034479-1-irogers@google.com/#/
>> [2] https://lore.kernel.org/lkml/20230302041211.852330-10-irogers@google.com/#/
>>
>> Fixes: 347c2f0a0988 ("perf parse-events: Sort and group parsed events")
>> Signed-off-by: Yanfei Xu <yanfei.xu@...el.com>
>> ---
>>   tools/perf/arch/x86/util/evlist.c | 13 +++++--------
>>   tools/perf/arch/x86/util/evsel.c  |  6 ++++++
>>   tools/perf/util/evsel.h           |  2 ++
>>   tools/perf/util/parse-events.c    |  9 ++++++---
>>   4 files changed, 19 insertions(+), 11 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
>> index b1ce0c52d88d..eed0a74c561a 100644
>> --- a/tools/perf/arch/x86/util/evlist.c
>> +++ b/tools/perf/arch/x86/util/evlist.c
>> @@ -75,17 +75,14 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
>>
>>   int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
>>   {
>> +       struct evsel *leader;
>> +
>>          if (topdown_sys_has_perf_metrics() &&
>>              (arch_evsel__must_be_in_group(lhs) || arch_evsel__must_be_in_group(rhs))) {
>> +               leader = evsel__leader(rhs);
> 
> Usually the leader doesn't make sense at this point. For example, a
> metric following event parsing may group
> {cycles,slots,topdown-fe-bound} and cycles will be the leader. The
> leader's properties say nothing about the ordering of slots and
> topdown-fe-bound.

Yes, the leader at this point is temporary. Actually what we want here
is to get a information of the group the event located at to know if
group contains topdown events.

> 
>>                  /* Ensure the topdown slots comes first. */
>> -               if (strcasestr(lhs->name, "slots") && !strcasestr(lhs->name, "uops_retired.slots"))
>> -                       return -1;
>> -               if (strcasestr(rhs->name, "slots") && !strcasestr(rhs->name, "uops_retired.slots"))
>> -                       return 1;
>> -               /* Followed by topdown events. */
>> -               if (strcasestr(lhs->name, "topdown") && !strcasestr(rhs->name, "topdown"))
>> -                       return -1;
>> -               if (!strcasestr(lhs->name, "topdown") && strcasestr(rhs->name, "topdown"))
>> +               if (strcasestr(rhs->name, "slots") && !strcasestr(rhs->name, "uops_retired.slots")
>> +                       && leader->must_be_in_group)
> 
> Why isn't this symmetric? As in the lhs has a test that will return -1.

The full code of this function is as below, the last "return" can keep
the -1 case.

int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
{
     struct evsel *leader;

     if (topdown_sys_has_perf_metrics() &&
         (arch_evsel__must_be_in_group(lhs) || 
arch_evsel__must_be_in_group(rhs))) {
         leader = evsel__leader(rhs);
         /* Ensure the topdown slots comes first. */
         if (strcasestr(rhs->name, "slots") && !strcasestr(rhs->name, 
"uops_retired.slots")
             && leader->must_be_in_group)
             return 1;
     }

     /* Default ordering by insertion index. */
     return lhs->core.idx - rhs->core.idx;
}


> 
>>                          return 1;
>>          }
>>
>> diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
>> index 090d0f371891..16f42fcfbe0b 100644
>> --- a/tools/perf/arch/x86/util/evsel.c
>> +++ b/tools/perf/arch/x86/util/evsel.c
>> @@ -44,6 +44,12 @@ bool arch_evsel__must_be_in_group(const struct evsel *evsel)
>>              strcasestr(evsel->name, "uops_retired.slots"))
>>                  return false;
>>
>> +       if (strcasestr(evsel->name, "topdown")) {
>> +               struct evsel *leader = evsel__leader(evsel);
>> +
>> +               leader->must_be_in_group = true;
>> +       }
>> +
>>          return strcasestr(evsel->name, "topdown") || strcasestr(evsel->name, "slots");
>>   }
>>
>> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
>> index 517cff431de2..a7ab266bc915 100644
>> --- a/tools/perf/util/evsel.h
>> +++ b/tools/perf/util/evsel.h
>> @@ -98,6 +98,8 @@ struct evsel {
>>                  bool                    bpf_counter;
>>                  bool                    use_config_name;
>>                  bool                    skippable;
>> +               /* any evsels with the flag set must be in a group */
>> +               bool            must_be_in_group;
> 
> This duplicates arch_evsel__must_be_in_group but one may be true when
> the other isn't. This doesn't make sense to me.

Indeed, this name might be not much accurate. The good point is that it
can keep all checks of "topdown" and "slots" being completed in
arch_evsel__must_be_in_group(). How about rename it as
group_has_topdown? or do you have any suggestion?

> 
> I think this patch needs a rethink and the behavior with it looks
> broken and more complex than what already exists. I also think the

I was thinking to check the existence of topdown event in
arch_evlist__cmp(). It could make the fix more consistent. However the
evlist at comparison stage of sorting is broken, not traversable. So the
check is moved in front of sorting list.

> problem is one that is more to do with the tool using the output than
> the reordering.

In my opinion, moving slots when topdown is present is neccesary. But
rerodering in above cases is unneccesary, just for looking more
comfortable? why not users do that by themself if need when inputting,
instead of perf do that only for topdown.

Thanks,
Yanfei

> 
> Thanks,
> Ian
> 
>>                  int                     bpf_fd;
>>                  struct bpf_object       *bpf_obj;
>>                  struct list_head        config_terms;
>> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
>> index 6f8b0fa17689..37950056a661 100644
>> --- a/tools/perf/util/parse-events.c
>> +++ b/tools/perf/util/parse-events.c
>> @@ -2052,9 +2052,12 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>>                   */
>>                  pos->core.idx = idx++;
>>
>> -               /* Remember an index to sort all forced grouped events together to. */
>> -               if (force_grouped_idx == -1 && pos == pos_leader && pos->core.nr_members < 2 &&
>> -                   arch_evsel__must_be_in_group(pos))
>> +               /*
>> +                * Remember an index to sort all forced grouped events together to,
>> +                * and check each evsel for setting must_be_in_group of its leader.
>> +                */
>> +               if (arch_evsel__must_be_in_group(pos) && force_grouped_idx == -1 &&
>> +                       pos == pos_leader && pos->core.nr_members < 2)
>>                          force_grouped_idx = pos->core.idx;
>>          }
>>
>> --
>> 2.40.1
>>
>