[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fV13PK23eTmmCoW6-NbAKmmMPUX_=eg60aexzyPg_a0Ug@mail.gmail.com>
Date: Mon, 19 Aug 2024 08:44:08 -0700
From: Ian Rogers <irogers@...gle.com>
To: James Clark <james.clark@...aro.org>
Cc: Thorsten Leemhuis <regressions@...mhuis.info>, Arnaldo Carvalho de Melo <acme@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Linux perf Profiling <linux-perf-users@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, James Clark <james.clark@....com>,
"cc: Marc Zyngier" <maz@...nel.org>, Hector Martin <marcan@...can.st>, Asahi Linux <asahi@...ts.linux.dev>,
Linux regressions mailing list <regressions@...ts.linux.dev>, Atish Patra <atishp@...osinc.com>
Subject: Re: [REGRESSION] Perf (userspace) broken on big.LITTLE systems since v6.5
On Mon, Aug 19, 2024 at 7:56 AM James Clark <james.clark@...aro.org> wrote:
>
>
>
> On 16/08/2024 4:30 pm, Ian Rogers wrote:
> > On Fri, Aug 16, 2024 at 2:23 AM James Clark <james.clark@...aro.org> wrote:
> >>
> >>
> >>
> >> On 15/08/2024 6:29 pm, Ian Rogers wrote:
> >>> On Wed, Aug 14, 2024 at 9:28 AM James Clark <james.clark@...aro.org> wrote:
> >>>> On 07/08/2024 9:54 am, Thorsten Leemhuis wrote:
> >>>>> On 01.08.24 21:05, Ian Rogers wrote:
> >>>>>> On Wed, Dec 6, 2023 at 4:09 AM Linux regression tracking #update
> >>>>>> (Thorsten Leemhuis) <regressions@...mhuis.info> wrote:
> >>>>>>>
> >>>>>>> [TLDR: This mail in primarily relevant for Linux kernel regression
> >>>>>>> tracking. See link in footer if these mails annoy you.]
> >>>>>>>
> >>>>>>> On 22.11.23 00:43, Bagas Sanjaya wrote:
> >>>>>>>> On Tue, Nov 21, 2023 at 09:08:48PM +0900, Hector Martin wrote:
> >>>>>>>>> Perf broke on all Apple ARM64 systems (tested almost everything), and
> >>>>>>>>> according to maz also on Juno (so, probably all big.LITTLE) since v6.5.
> >>>>>>>
> >>>>>>> #regzbot fix: perf parse-events: Make legacy events lower priority than
> >>>>>>> sysfs/JSON
> >>>>>>> #regzbot ignore-activity
> >>>>>>
> >>>>>> Note, this is still broken.
> >>>>>
> >>>>> Hmmm, so all that became somewhat messy. Arnaldo, what's the way out of
> >>>>> this? Or is this a "we are screwed one way or another and someone has to
> >>>>> bite the bullet" situation?
> >>>>>
> >>>>> Ciao, Thorsten
> >>>>>
> >>>>>> The patch changed the priority in the case
> >>>>>> that you do something like:
> >>>>>>
> >>>>>> $ perf stat -e 'armv8_pmuv3_0/cycles/' benchmark
> >>>>>>
> >>>>>> but if you do:
> >>>>>>
> >>>>>> $ perf stat -e 'cycles' benchmark
> >>>>>>
> >>>>>> then the broken behavior will happen as legacy events have priority
> >>>>>> over sysfs/json events in that case. To fix this you need to revert:
> >>>>>> 4f1b067359ac Revert "perf parse-events: Prefer sysfs/JSON hardware
> >>>>>> events over legacy"
> >>>>>>
> >>>>>> This causes some testing issues resolved in this unmerged patch series:
> >>>>>> https://lore.kernel.org/lkml/20240510053705.2462258-1-irogers@google.com/
> >>>>>>
> >>>>>> There is a bug as the arm_dsu PMU advertises an event called "cycles"
> >>>>>> and this PMU is present on Ampere systems. Reverting the commit above
> >>>>>> will cause an issue as the commit 7b100989b4f6 ("perf evlist: Remove
> >>>>>> __evlist__add_default") to fix ARM's BIG.little systems (opening a
> >>>>>> cycles event on all PMUs not just 1) will cause the arm_dsu event to
> >>>>>> be opened by perf record and fail as the event won't support sampling.
> >>>>>>
> >>>>>> The patch https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@google.com/
> >>>>>> fixes this by only opening the cycles event on core PMUs when choosing
> >>>>>> default events.
> >>>>>>
> >>>>>> Rather than take this patch the revert happened as Linus runs the
> >>>>>> command "perf record -e cycles:pp" (ie using a specified event and not
> >>>>>> defaults) and considers it a regression in the perf tool that on an
> >>>>>> Ampere system to need to do "perf record -e
> >>>>>> 'armv8_pmuv3_0/cycles/pp'". It was pointed out that not specifying -e
> >>>>>> will choose the cycles event correctly and with better precision the
> >>>>>> pp for systems that support it, but it was still considered a
> >>>>>> regression in the perf tool so the revert was made to happen. There is
> >>>>>> a lack of perf testing coverage for ARM, in particular as they choose
> >>>>>> to do everything in a different way to x86. The patch in question was
> >>>>>> in the linux-next tree for weeks without issues.
> >>>>>>
> >>>>>> ARM/Ampere could fix this by renaming the event from cycles to
> >>>>>> cpu_cycles, or by following Intel's convention that anything uncore
> >>>>>> uses the name clockticks rather than cycles. This could break people
> >>>>>> who rely on an event called arm_dsu/cycles/ but I imagine such people
> >>>>>> are rare. There has been no progress I'm aware of on renaming the
> >>>>>> event.
> >>>>>>
> >>>>>> Making perf not terminate on opening an event for perf record seems
> >>>>>> like the most likely workaround as that is at least something under
> >>>>>> the tool maintainers control. ARM have discussed doing this on the
> >>>>>> lists:
> >>>>>> https://lore.kernel.org/lkml/f30f676e-a1d7-4d6b-94c1-3bdbd1448887@arm.com/
> >>>>>> but since the revert in v6.10 no patches have appeared for the v6.11
> >>>>>> merge window. Feature work like coresight improvements and ARMv9 are
> >>>>>> being actively pursued by ARM, but feature work won't resolve this
> >>>>>> regression.
> >>>>>>
> >>>>
> >>>> I got some hardware with the DSU PMU so I'm going to have a go at trying
> >>>> to send some fixes for this. My initial idea was to try incorporate the
> >>>> "not terminate on opening" change as discussed in the link directly
> >>>> above. And then do the revert of the "revert of prefer sysfs/json".
> >>>
> >>> Thanks, I think this would be good. The biggest issue is that none of
> >>> the record logic expects a file descriptor to be not opened, deleting
> >>> unopened evsels from the evlist breaks all the indexing into the
> >>> mmaps, etc. Tbh, you probably wouldn't do the code this way if was
> >>> written afresh. Perhaps a hashmap would map from an evsel to ring
> >>> buffer mmaps, etc. Trying to avoid having global state and benefitting
> >>> from encapsulation. I'd focus on just doing the expedient thing in the
> >>> changes, which probably just means making the record code tolerant of
> >>> evsels that fail to open and not modifying the evlist due to the risk
> >>> it breaks the indices.
> >>>
> >>
> >> Thanks for the tips.
> >>
> >>> (To point out the obvious, this work wouldn't be necessary if arm_dsu
> >>> event were renamed from "cycles" to "cpu_cycles" which would also make
> >>> it more intention revealing alongside the arm_dsu's "bus_cycles" event
> >>> name).
> >>>
> >>
> >> I understand but I can imagine the following conversation if we rename that:
> >>
> >> User: "I updated my kernel and now my (non Perf) tool fails to open
> >> the DSU cycles event because it doesn't exist anymore"
> >>
> >> Linus/maintainers: "Oh ok yes that was a userspace breaking change,
> >> lets revert it"
> >>
> >> Just because Perf can handle 3 different names for cycles doesn't mean
> >> other tools can.
> >
> > cycles was a bad event name, dsu is a terrible name for what is mainly
> > the l3 cache, the risk that the two are combined get broken I'm fine
> > with as neoverse users with uncore permissions are say much rarer than
> > Apple M users. Having a cycles and a bus_cycles event is already
> > ambiguous, they sound the same. Renaming cycles to cpu_cycles would be
> > best.
> >
> >>>> FWIW I don't think Juno currently is broken if the kernel supports
> >>>> extended type ID? I could have missed some output in this thread but it
> >>>> seems like it's mostly related to Apple M hardware. I'm also a bit
> >>>> confused why the "supports extended type" check fails there, but maybe
> >>>> the v6.9 commit 25412c036 from Mark is missing?
> >>>
> >>> So I think your later emails clarify Arnaldo is probably missing:
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/commit/drivers/perf/arm_pmu.c?h=perf-tools-next&id=5c816728651ae425954542fed64d21d40cb75a9f
> >>>
> >>> Fwiw, the Apple M hardware issue came to me by way of Mark Rutland
> >>> (iirc), this regression report, etc. My understanding is that Apple M
> >>> has something like a v2 ARM PMU and the legacy events are encoded
> >>> incorrectly in the driver for this. The regression in v6.5 happened
> >>
> >> I'm not sure about that. The M PMU events may be incomplete, but the two
> >> that are there have a mapping that looks sane:
> >>
> >> static const unsigned m1_pmu_perf_map[PERF_COUNT_HW_MAX] = {
> >> PERF_MAP_ALL_UNSUPPORTED,
> >> [PERF_COUNT_HW_CPU_CYCLES] = M1_PMU_PERFCTR_CPU_CYCLES,
> >> [PERF_COUNT_HW_INSTRUCTIONS] = M1_PMU_PERFCTR_INSTRUCTIONS,
> >> /* No idea about the rest yet */
> >> };
> >>
> >> And they map to the same named events:
> >>
> >> static struct attribute *m1_pmu_event_attrs[] = {
> >> M1_PMU_EVENT_ATTR(cycles, M1_PMU_PERFCTR_CPU_CYCLES),
> >> M1_PMU_EVENT_ATTR(instructions, M1_PMU_PERFCTR_INSTRUCTIONS),
> >> NULL,
> >> };
> >>
> >> So in this case I can't see using legacy vs sysfs events making a
> >> difference. Maybe there is some other case that was mentioned in a
> >> previous thread that I missed though.
> >
> > No idea, iirc Mark Rutland requested not to use legacy events for Apple M.
> >
>
> The point I was trying to make here was that there isn't _technically_
> any user facing bug on Apple M with both a new kernel and new perf,
> despite the issues Mark mentioned.
>
> I think there's a bit more subtlety in Mark's request. Using sysfs is
> only required for old kernels that don't support extended type ID, and
> it's not specific to apple M, that's for everywhere. The other case he
> mentioned was when the events are slightly different but with the same
> name as legacy, which isn't the case here specifically but is already
> fixed by ("perf parse-events: Make legacy events lower priority than
> sysfs/JSON") (v6.8).
>
> >>> because ARM's core PMUs had previously been treated as uncore PMUs,
> >>> meaning we wouldn't try to program legacy events on them. Fixing the
> >>> handling of ARM's core PMUs broke Apple M due to the broken legacy
> >>> event mappings. Why not fix the Apple M PMU driver? Well there was
> >>> anyway a similar RISC-V issue reported by Atish Patra (iirc) where the
> >>> RISC-V PMU driver wants to delegate the mapping of legacy events to
> >>> the perf tool so the driver needn't be aware of all and future RISC-V
> >>> configurations. The fix discussed with Mark, Atish, etc. has been to
> >>> swap the priority of legacy and sysfs/json events so that the latter
> >>> has priority. We need the revert of the revert as currently we only do
> >>> this if a PMU is specified with an event, not for the general wildcard
> >>> PMUs case that most people use. There was huge fallout from flipping
> >>
> >> Yep makes sense to do the revert if RISC-V isn't going to support any
> >> legacy events. Although from what I understand that would technically
> >> only require JSON to be the highest priority? Because putting named
> >> events in sysfs still requires kernel involvement so doesn't get you any
> >> further than supporting the legacy events?
> >
> > The sysfs and json event handling is interwoven, for example you can
> > add to a sysfs event with json information. There are basically two
> > approaches in the event parser, hardcoded legacy things and event
> > names (optionally with PMU names). I'm trying to get rid of the
> > hardcoded legacy things as they were fine when you had a single core
> > type, but I want to have events everywhere - say instructions and
> > cycles on a GPU so we can IPC on a GPU. For RISC-V as long as the
> > legacy events are covered as names in json and json/sysfs has priority
> > over legacy then things will be fine.
> >
> >> Seems like there is another reason to do the revert though as Mark
> >> mentioned: That now directly specifying the PMU eg "-e
> >> arm_cortex_a56/cycles/" opens a legacy event if the event matches one,
> >> which is not the best thing to do. But the revert fixes this AFAIK, so
> >> while having the priority JSON/legacy/sysfs might work for RISC-V it
> >> wouldn't work for a platform that wants a slightly different sysfs event
> >> than legacy but with the same name. And the priority should be
> >> JSON/sysfs/legacy.
> >
> > The priority for events with a PMU is the sysfs/json has a priority
> > over legacy names, so I don't understand what you're saying here. Your
> > example shouldn't be broken. The revert is for the case where no PMU
> > is specified, where the priority is the opposite which is at best
> > inconsistent.
> >
>
> Yep you're right, I got confused with the original bug report which is
> now old. With commit a24d9d9dc ("perf parse-events: Make legacy events
> lower priority than sysfs/JSON") (v6.8) named PMUs do prioritize sysfs.
>
> >>> the priority particularly on Intel as all test expectations needed
> >>> updating. I've sent out similar fixes that need incorporating when the
> >>> revert is reverted. Ideally tools/perf/tests/parse-events.c would be
> >>> updated to cover ARM's PMUs that don't follow the normal pattern that
> >>> the core PMU is called "cpu" (this would mean that we were testing
> >>> event parsing on ARM was WAI wrt encoding priorities, BIG.little,
> >>> etc).
> >>>
> >>>> I sent a small fix the other day to make perf stat default arguments
> >>>> work on Juno, and didn't notice anything out of the ordinary:
> >>>> https://lore.kernel.org/linux-perf-users/dac6ad1d-5aca-48b4-9dcb-ff7e54ca43f6@linaro.org/T/#t
> >>>> I agree that change is quite narrow but it does incrementally improve
> >>>> things for the time being. It's possible that it would become redundant
> >>>> if I can just include Ian's change to use strings for Perf stat.
> >>>
> >>> I'd prefer we didn't merge this as we'd need to rebase:
> >>> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
> >>> and those changes would then delete the code introduced. I'm fine with
> >>> adding the tests.
> >>>
> >>> There are more exotic heterogeneous core things upcoming, probably
> >>> also from ARM, and the thought of duplicating the default attribute
> >>> logic and event parsing constraints is just something I'd prefer not
> >>> to have to do.
> >>>
> >>
> >> Yep I don't have any strong feelings about this. Even if we don't merge
> >> it it helped me understand the code and the issue a bit.
> >>
> >> I think one thing I assumed about your change was that there was some
> >> dependency on these other changes. But the more I look at it I think
> >> it's actually fine on it's own?
> >
> > Which change? If the change is trying to use "cycles" to open on all
> > PMUs because it will be wild carded then it will run into the priority
> > issue.
> >
>
> Just patch 3 here:
> https://lore.kernel.org/lkml/20240510053705.2462258-4-irogers@google.com/
>
> I assume it works because we don't open on uncore right now. But I'm
> still rebasing and testing it. So we could merge that, and then when we
> do the priority revert along with the fix to ignore the DSU error it
> will continue to work.
>
> >> Using the cycles string actually works today, even on Apple M. The only
> >> real remaining issue is softening the error for failure to open, but
> >> that's _after_ doing the revert of the revert and is separate.
> >>
> >> I will re-test that one today with fresh eyes.
> >
> > Perhaps it is other legacy events, not cycles and instructions. There
> > must have been a reason for this regression report but I don't have an
> > Apple M CPU to test on.
> >
>
> This regression report is for various (admittedly extremely confusing)
> combinations of kernels and perfs without the following patches:
>
> 5c81672865 ("arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability")
> (v6.6 kernel release)
>
> 25412c036 ("perf print-events: make is_event_supported() more robust")
> (v6.9 Perf release for Apple M)
>
> a24d9d9dc ("perf parse-events: Make legacy events lower priority than
> sysfs/JSON")
> (v6.8 Perf)
>
> With all of those applied everything is fixed even on Apple M. I don't
> think anything needs to be fixed for the bare "-e cycles" that you
> mentioned at the beginning of the chain because that never regressed, it
> actually never worked on big.LITTLE until 5c81672865, and after that
> using legacy was fine. I don't think Mark actually wants bare "cycles"
> to _not_ use legacy either because it never did. He only mentioned what
> happens when you really do want to target a PMU with a name (already
> fixed in a24d9d9dc).
I'm not clear, is your point that when we get regression reports on
the tool like this and Mark says things to me face-to-face at LPC we
should ignore the issue and wait for the driver fix? The PMU driver
for Apple M has fixed the legacy defaults for instructions and cycles,
great - this was the obvious fix for a driver issue from the get go.
Has it fixed all legacy values? Are you saying we should flip from
sysfs/json preferred over legacy to legacy preferred over sysfs json?
I still would like to get rid of legacy events having different wild
card behavior, cpu-cycles (legacy - matches only core PMUs) vs
cpu_cycles (sysfs - matches on all PMUs) but if we need to carry this
awkwardness for the sake of arm_dsu then *sigh* ok, it'll be forever a
potential trap when writing metrics - beware magic legacy names that
won't work on anything other than core PMUs. We carry lots of other
discrepancies around for things like arbitrary hex cut off values to
work around PMU suffix naming (5fabcdef vs a53 - both hex suffixes
with different interpretations), hotplug handling, etc. One concern
that's been raised is other tools being able to work correctly, given
the minefield set up in this regard I can imagine legacy events
working but little else. At least we can work to have the reference
implementation that comes with the kernel working.
Thanks,
Ian
Powered by blists - more mailing lists