lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fXdbbx28pO3K8ARky+zEGZo16jZU-oEjHTAJtwvjx7jYw@mail.gmail.com>
Date: Fri, 7 Mar 2025 14:10:13 -0800
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: James Clark <james.clark@...aro.org>, Namhyung Kim <namhyung@...nel.org>, 
	linux-perf-users@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>, 
	Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC] perf tools: About encodings of legacy event names

On Fri, Mar 7, 2025 at 10:48 AM Ian Rogers <irogers@...gle.com> wrote:
>
> On Fri, Mar 7, 2025 at 7:10 AM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
> >
> > On Fri, Mar 07, 2025 at 02:17:22PM +0000, James Clark wrote:
> > > On 24/02/2025 3:01 pm, Arnaldo Carvalho de Melo wrote:
> > > > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote:
> > > > > I knew of this tech debt and separately RISC-V was also interested to
> > > > > have sysfs/json be the priority so that the legacy to config encoding
> > > > > could exist more in the perf tool than the PMU driver. I'm a SIG
> >
> > > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as
> > > > they didn't want to break the perf tooling workflow, no?
> >
> > > Doesn't most of the discussion stem from this particular point? I also
> > > understood it that way, that risc-v folks agreed it was better to support
> > > these to make all existing software work, not just Perf.
> >
> > That is my understanding, and I agree with them and with you.
>
> This is describing what RISC-V have been forced into doing:
> 1) to support non-perf tooling,
> 2) because the perf is inconsistent in priority with legacy and
> sysfs/json events.
>
> Their preference has been to move these problems into the tool not the
> PMU driver. What you are saying here is to ignore their preference.
> I've already quoted them in this thread saying this, but this keeps
> being ignored. Here is my previous message:
> https://lore.kernel.org/lkml/CAP-5=fXSgpZaAgickZSWgjt-2iTWK7FFZc65_HG3QhrTg1mtBw@mail.gmail.com/
>
> > > Maybe one issue was calling them 'legacy' events in the first place, and I'm
> > > not sure if there is complete consensus that these are legacy.
> >
> > I don't see them as "legacy".
>
> So let me say this is really distracting from the intent in the
> series. The series is:
> 1) trying to clean up wild carding ambiguity - not making it dependent
> on the name of the event being parsed, the behavior of `cpu_cycles`
> matches that of `cpu-cycles`
> 2) trying to make the legacy vs sysfs/json prioritization consistent -
> making it so that `cpu_core/instructions/` encoding matches
> `instructions` as we display both of these as cpu_core/instructions/
> and it is confusing to a user that different encodings were used. We
> also pattern match perf_event_attr config values in places like:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/topdown.c?h=perf-tools-next#n38
> so >1 config for the same event means such pattern matching needs to
> consider all cases.
>
> There is now a  "Make Legacy Events Great Again" (MLEGA) effort that
> is standing in the way of clean up work. As already stated but
> repeating, why is MLEGA a bad thing:
> 1) legacy events lack descriptions and are open for interpretation.
> For example, do the events include counts for things done
> speculatively?
> 2) it is unneeded. Vendors can choose to name events the same name in
> sysfs and json. ARM are achieving pretty much all of the same thing
> with architecture standard events but in their use they will have
> appropriate event descriptions for each model giving all the caveats
> for the event. When something is common we can encode it in the common
> json we don't need legacy events for this:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common?h=perf-tools-next
> 3) LLC doesn't mean L2, it nearly always means L3, the event names
> have become obsolete and confusing. More MLEGA means more of this.
> 4) PMUs have only ever supported a subset of the legacy events. We
> have to make use of legacy events in `perf stat` not fail when they
> are implicitly added as default events and via the -ddd options.
> 5) multiple encodings/PMU types for the same thing complicates things
> like topdown event ordering, that is a kernel/PMU restriction, and
> metric event deduplication.
> 6) legacy events are broken on ARM Apple-M and have been broken on Juno boards.

To be clear here. If legacy events have priority and the encoding is
broken in the driver, the tool needs to have workarounds for specific
models, or the user needs to specify a PMU, we complicate event config
matching, etc. If the event encoding is broken in the json we simply
fix the json - no need to introduce additional tool complications.
Beside a potential consistency of name across models, something that
has never existed due to PMUs not implementing most legacy events and
ambiguity over what an event is, I see no advantage to legacy events
and major drawbacks.

Thanks,
Ian

> 7) architectures trying to push complexity into user land (RISC-V) are
> being forced to push it into the kernel/driver.
>
> Is MLEGA relevant here? Well if you want legacy events to be >
> sysfs/json then yes. For wild carding I don't see why MLEGA cares. Do
> I want to push on MLEGA? No, and I think the reasons above are why it
> hasn't happened in over 10 years.
>
> > > Can't they continue be the short easy list of events likely to be common across
> > > platforms?
> >
> > That is my understanding of the original intent, yes.
> >
> > A first approximation, those who want to dig deeper, well, learn more
> > about the architecture, learn about the extensive support for
> > vendor/JSON events, sysfs ones, how to properly configure them taking
> > advantage of the high level of flexibility both perf, the tool and perf
> > the kernel subsystem allows them to be used, in groups, leader sampling,
> > multiplexing or not, etc.
> >
> > But lots of developers seem to be OK with just the default events or
> > using those aliases for expected events across architectures, sometimes
> > specifying :ppp as a hint that if there are more precise events in this
> > architecture, please use them, for instance.
>
> When and where have I said that I don't want to support events like
> instructions and cycles? See above, consistent wild carding and the
> encoding priority are the only issues here.
>
> > > If there is an issue with some of them being wrong in some places
> > > we can move forward from that by making sure new platforms do it right,
> >
> > And adding special case for broken things when we know that some event
> > named "cycles" shouldn't be used for sampling, for instance.
>
> What is this? A new framework for special casing PMUs and events,
> where we're maintaining lists of broken PMUs and changing encodings?
> And tooling like event sorting, metrics, is all supposed to just work
> with this? Are we going to write json for this? Who is writing/testing
> it for Apple-M?
>
> Special cases should be the exception and not an expected norm.
>
> > > rather than changing the logic for everyone to fix that bug.
> >
> > Right. And again, if something doesn't work for a while in some
> > architecture, its just a matter of specifying the name of the event in
> > full form, with the PMU prefix, etc.
>
> So MLEGA would like sysfs/json when they are broken? This is just
> silly, if something is broken we should just not use it. Having 2 ways
> of stating something and expecting different behaviors from them is
> clearly brittle.
>
> > > For the argument that Google prefers to use the sysfs events because of
> > > these differences, I don't think there is anything preventing that kind of
> > > use today?
> >
> > Indeed.
>
> I explained that in the context of why legacy events are wrong. I've
> repeated it above. This is not addressing the issues of wild carding
> and the encoding priority.
>
> > > Or at least not for the main priority flip proposed, but maybe
> > > there are some smaller adjacent bugs that can be fixed up separately.
> >
> > Yes, and work in this area is greatly appreciated.
>
> I don't know what your proposals are and to my eyes none of them have
> ever existed, no one has created them in over 10 years.
> I am trying to fix wild carding and the encoding priority.
> Bike shedding on MLEGA, please can we move it to a separate email thread.
>
> Thanks,
> Ian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ