linux-kernel - Re: [Patch v2 3/5] perf x86/topdown: Don't move topdown metrics events when sorting events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUP1O+VnH+7PaZtEgsFUFOpjo-tRtmAyVjG=Q4GFToR7g@mail.gmail.com>
Date: Wed, 10 Jul 2024 08:07:24 -0700
From: Ian Rogers <irogers@...gle.com>
To: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Kan Liang <kan.liang@...ux.intel.com>, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Yongwei Ma <yongwei.ma@...el.com>, Dapeng Mi <dapeng1.mi@...el.com>
Subject: Re: [Patch v2 3/5] perf x86/topdown: Don't move topdown metrics
 events when sorting events

On Wed, Jul 10, 2024 at 2:40 AM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
>
>
> On 7/10/2024 6:37 AM, Ian Rogers wrote:
> > On Mon, Jul 8, 2024 at 9:18 PM Mi, Dapeng <dapeng1.mi@...ux.intel.com> wrote:
> >>
> >> On 7/8/2024 11:08 PM, Ian Rogers wrote:
> >>> On Mon, Jul 8, 2024 at 12:40 AM Dapeng Mi <dapeng1.mi@...ux.intel.com> wrote:
> >>>> when running below perf command, we say error is reported.
> >>>>
> >>>> perf record -e "{slots,instructions,topdown-retiring}:S" -vv -C0 sleep 1
> >>>>
> >>>> ------------------------------------------------------------
> >>>> perf_event_attr:
> >>>>   type                             4 (cpu)
> >>>>   size                             168
> >>>>   config                           0x400 (slots)
> >>>>   sample_type                      IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> >>>>   read_format                      ID|GROUP|LOST
> >>>>   disabled                         1
> >>>>   sample_id_all                    1
> >>>>   exclude_guest                    1
> >>>> ------------------------------------------------------------
> >>>> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
> >>>> ------------------------------------------------------------
> >>>> perf_event_attr:
> >>>>   type                             4 (cpu)
> >>>>   size                             168
> >>>>   config                           0x8000 (topdown-retiring)
> >>>>   { sample_period, sample_freq }   4000
> >>>>   sample_type                      IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> >>>>   read_format                      ID|GROUP|LOST
> >>>>   freq                             1
> >>>>   sample_id_all                    1
> >>>>   exclude_guest                    1
> >>>> ------------------------------------------------------------
> >>>> sys_perf_event_open: pid -1  cpu 0  group_fd 5  flags 0x8
> >>>> sys_perf_event_open failed, error -22
> >>>>
> >>>> Error:
> >>>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring).
> >>>>
> >>>> The reason of error is that the events are regrouped and
> >>>> topdown-retiring event is moved to closely after the slots event and
> >>>> topdown-retiring event needs to do the sampling, but Intel PMU driver
> >>>> doesn't support to sample topdown metrics events.
> >>>>
> >>>> For topdown metrics events, it just requires to be in a group which has
> >>>> slots event as leader. It doesn't require topdown metrics event must be
> >>>> closely after slots event. Thus it's a overkill to move topdown metrics
> >>>> event closely after slots event in events regrouping and furtherly cause
> >>>> the above issue.
> >>>>
> >>>> Thus delete the code that moving topdown metrics events to fix the
> >>>> issue.
> >>> I think this is wrong. The topdown events may not be in a group, such
> >>> cases can come from metrics due to grouping constraints, and so they
> >>> must be sorted together so that they may be gathered into a group to
> >>> avoid the perf event opens failing for ungrouped topdown events. I'm
> >>> not understanding what these patches are trying to do, if you want to
> >>> prioritize the event for leader sampling why not modify it to compare
> >> Per my understanding, this change doesn't break anything. The events
> >> regrouping can be divided into below several cases.
> >>
> >> a. all events in a group
> >>
> >> perf stat -e "{instructions,topdown-retiring,slots}" -C0 sleep 1
> >> WARNING: events were regrouped to match PMUs
> >>
> >>  Performance counter stats for 'CPU(s) 0':
> >>
> >>         15,066,240      slots
> >>          1,899,760      instructions
> >>          2,126,998      topdown-retiring
> >>
> >>        1.045783464 seconds time elapsed
> >>
> >> In this case, slots event would be adjusted as the leader event and all
> >> events are still in same group.
> >>
> >> b. all events not in a group
> >>
> >> perf stat -e "instructions,topdown-retiring,slots" -C0 sleep 1
> >> WARNING: events were regrouped to match PMUs
> >>
> >>  Performance counter stats for 'CPU(s) 0':
> >>
> >>          2,045,561      instructions
> >>         17,108,370      slots
> >>          2,281,116      topdown-retiring
> >>
> >>        1.045639284 seconds time elapsed
> >>
> >> In this case, slots and topdown-retiring are placed into a group and slots
> >> is the group leader. instructions event is outside the group.
> >>
> >> c. slots event in group but topdown metric events outside the group
> >>
> >> perf stat -e "{instructions,slots},topdown-retiring"  -C0 sleep 1
> >> WARNING: events were regrouped to match PMUs
> >>
> >>  Performance counter stats for 'CPU(s) 0':
> >>
> >>         20,323,878      slots
> >>          2,634,884      instructions
> >>          3,028,656      topdown-retiring
> >>
> >>        1.045076380 seconds time elapsed
> >>
> >> In this case, topdown-retiring event is placed into previous group and
> >> slots is adjusted to leader event.
> >>
> >> d. multiple event groups
> >>
> >> perf stat -e "{instructions,slots},{topdown-retiring}"  -C0 sleep 1
> >> WARNING: events were regrouped to match PMUs
> >>
> >>  Performance counter stats for 'CPU(s) 0':
> >>
> >>         26,319,024      slots
> >>          2,427,791      instructions
> >>          2,683,508      topdown-retiring
> >>
> >>        1.045495830 seconds time elapsed
> >>
> >> In this case, the two groups are merged to one group and slots event is
> >> adjusted as leader.
> >>
> >> The key point of this patch is that it's unnecessary to move topdown
> >> metrics events closely after slots event. It's a overkill since Intel core
> >> PMU driver doesn't require that. Intel PMU driver just requires topdown
> >> metrics events are in a group where slots event is the group leader, and
> >> worse the movement for topdown metrics events causes the issue in the
> >> commit message mentioned.
> >>
> >> This patch doesn't block to regroup topdown metrics event. It just removes
> >> the unnecessary movement for topdown metrics events.
> > But you will get the same behavior because of the non-arch dependent
> > force group index - I guess you don't care as the sample read only
> > happens when you have a group.
> >
> > I'm thinking of cases like (which admittedly is broken):
> > ```
> > $ perf stat -e "{slots,instructions},cycles,topdown-fe-bound" -a sleep 0.1
> > [sudo] password for irogers:
> >
> > Performance counter stats for 'system wide':
> >
> >     2,589,345,900      slots
> >       852,492,838      instructions
> >       583,525,372      cycles
> >   <not supported>      topdown-fe-bound
> >
> >       0.103930790 seconds time elapsed
> > ```
>
> I run the upstream code (commit 73e931504f8e0d42978bfcda37b323dbbd1afc08)
> without this patchset, I see same issue.
>
> perf stat -e "{slots,instructions},cycles,topdown-fe-bound" -a sleep 0.1
>
>  Performance counter stats for 'system wide':
>
>        262,448,922      slots
>         29,630,373      instructions
>         43,891,902      cycles
>    <not supported>      topdown-fe-bound
>
>        0.150369560 seconds time elapsed
>
> #perf -v
> perf version 6.10.rc6.g73e931504f8e
>
> This issue is not caused by this patchset.

I agree, but I think what is broken above was caused by the forced
grouping change that I did for Andi. The point of your change here is
to say that topdown events don't need to be moved while sorting, but
what should be happening here is just that. topdown-fe-bound should be
moved into the group with slots and instructions so it isn't "<not
supported>". So yes the current code has issues, but that's not the
same as saying we don't need to move topdown events, we do so that we
may group them better.

Thanks,
Ian

> > As the slots event is grouped there's no force group index on it, we
> > want to shuffle the topdown-fe-bound into the group so we want it to
> > compare as less than cycles - ie we're comparing topdown events with
> > non topdown events and trying to shuffle the topdown events first.
>
> Current evlist__cmp() won't really swap the order of cycles and
> topdown-fe-bound.
>
> if (lhs_sort_idx != rhs_sort_idx)
>         return lhs_sort_idx - rhs_sort_idx;
>
> When comparing cycles and topdown-fe-bound events, lhs_sort_idx is 2 and
> rhs_sort_idx is 3, so the swap won't happen.
>
> So the event sequence after sorting is still "slots, instructions ,cycles,
> topdown-fe-bound". Both cycles and topdown-fe-bound events won't be placed
> into the group.
>
>
> >
> > Thanks,
> > Ian
> >
> >
> >
> >>> first?
> >>>
> >>> Thanks,
> >>> Ian
> >>>
> >>>> Signed-off-by: Dapeng Mi <dapeng1.mi@...ux.intel.com>
> >>>> ---
> >>>>  tools/perf/arch/x86/util/evlist.c | 5 -----
> >>>>  1 file changed, 5 deletions(-)
> >>>>
> >>>> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> >>>> index 332e8907f43e..6046981d61cf 100644
> >>>> --- a/tools/perf/arch/x86/util/evlist.c
> >>>> +++ b/tools/perf/arch/x86/util/evlist.c
> >>>> @@ -82,11 +82,6 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> >>>>                         return -1;
> >>>>                 if (arch_is_topdown_slots(rhs))
> >>>>                         return 1;
> >>>> -               /* Followed by topdown events. */
> >>>> -               if (arch_is_topdown_metrics(lhs) && !arch_is_topdown_metrics(rhs))
> >>>> -                       return -1;
> >>>> -               if (!arch_is_topdown_metrics(lhs) && arch_is_topdown_metrics(rhs))
> >>>> -                       return 1;
> >>>>         }
> >>>>
> >>>>         /* Default ordering by insertion index. */
> >>>> --
> >>>> 2.40.1
> >>>>