[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fWUVycpDss_+MNQ_DM93AYKWED8aYOUBKLziYTOn68QJA@mail.gmail.com>
Date: Wed, 1 Oct 2025 13:55:36 -0700
From: Ian Rogers <irogers@...gle.com>
To: James Clark <james.clark@...aro.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, Kan Liang <kan.liang@...ux.intel.com>,
Xu Yang <xu.yang_2@....com>, Thomas Falcon <thomas.falcon@...el.com>,
Andi Kleen <ak@...ux.intel.com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, Atish Patra <atishp@...osinc.com>,
Beeman Strong <beeman@...osinc.com>, Leo Yan <leo.yan@....com>,
Vince Weaver <vincent.weaver@...ne.edu>
Subject: Re: [PATCH v6 00/28] Legacy hardware/cache events as json
On Wed, Oct 1, 2025 at 8:12 AM Ian Rogers <irogers@...gle.com> wrote:
>
> On Wed, Oct 1, 2025 at 6:38 AM James Clark <james.clark@...aro.org> wrote:
> >
> >
> >
> > On 23/09/2025 11:32 pm, Ian Rogers wrote:
> > > Mirroring similar work for software events in commit 6e9fa4131abb
> > > ("perf parse-events: Remove non-json software events"). These changes
> > > migrate the legacy hardware and cache events to json. With no hard
> > > coded legacy hardware or cache events the wild card, case
> > > insensitivity, etc. is consistent for events. This does, however, mean
> > > events like cycles will wild card against all PMUs. A change doing the
> > > same was originally posted and merged from:
> > > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > > and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> > > parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> > > his dislike for the cycles behavior on ARM with perf record. Earlier
> > > patches in this series make perf record event opening failures
> > > non-fatal and hide the cycles event's failure to open on ARM in perf
> > > record, so it is expected the behavior will now be transparent in perf
> > > record on ARM. perf stat with a cycles event will wildcard open the
> > > event on all PMUs, however, with default events the cycles event will
> > > only be opened on core PMUs.
> > >
> > > The key motivation for these patches is so that if, for example, you
> > > run `perf stat -e cpu-cycles ...` on a hybrid x86 and the results are
> > > printed out with "cpu_core/cpu-cycles/" and "cpu_atom/cpu-cycles/",
> > > the perf_event_attr for cpu-cycles and cpu_core/cpu-cycles/ be the
> > > same, similarly for the cpu_atom event. Prior to these patches the
> > > event with a PMU prefers sysfs/json over legacy encodings while with
> > > no PMU legacy encodings are preferred - these are different encodings
> > > on x86.
> > >
> > > The change to support legacy events with PMUs was done to clean up
> > > Intel's hybrid PMU implementation. Having sysfs/json events with
> > > increased priority to legacy was requested by Mark Rutland
> > > <mark.rutland@....com> to fix Apple-M PMU issues wrt broken legacy
> > > events on that PMU. It is believed the PMU driver is now fixed, but
> > > this has only been confirmed on ARM Juno boards. It was requested that
> > > RISC-V be able to add events to the perf tool json so the PMU driver
> > > didn't need to map legacy events to config encodings:
> > > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> > > This patch series achieves this.
> > >
> > > A previous series of patches decreasing legacy hardware event
> > > priorities was posted in:
> > > https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
> > > Namhyung Kim <namhyung@...nel.org> mentioned that hardware and
> > > software events can be implemented similarly:
> > > https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
> > > and this patch series achieves this.
> > >
> > > Note, patch 2 (perf parse-events: Fix legacy cache events if event is
> > > duplicated in a PMU) fixes a function deleted by patch 17 (perf
> > > parse-events: Remove hard coded legacy hardware and cache
> > > parsing). Adding the json exposed an issue when legacy cache (not
> > > legacy hardware) and sysfs/json events exist. The fix is necessary to
> > > keep tests passing through the series. It is also posted for backports
> > > to stable trees.
> > >
> > > The perf list behavior includes a lot more information and events. The
> > > before behavior on a hybrid alderlake is:
> > > ```
> > > $ perf list hw
> > >
> > > List of pre-defined events (to be used in -e or -M):
> > >
> > > branch-instructions OR branches [Hardware event]
> > > branch-misses [Hardware event]
> > > bus-cycles [Hardware event]
> > > cache-misses [Hardware event]
> > > cache-references [Hardware event]
> > > cpu-cycles OR cycles [Hardware event]
> > > instructions [Hardware event]
> > > ref-cycles [Hardware event]
> > > $ perf list hwcache
> > >
> > > List of pre-defined events (to be used in -e or -M):
> > >
> > >
> > > cache:
> > > L1-dcache-loads OR cpu_atom/L1-dcache-loads/
> > > L1-dcache-stores OR cpu_atom/L1-dcache-stores/
> > > L1-icache-loads OR cpu_atom/L1-icache-loads/
> > > L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
> > > LLC-loads OR cpu_atom/LLC-loads/
> > > LLC-load-misses OR cpu_atom/LLC-load-misses/
> > > LLC-stores OR cpu_atom/LLC-stores/
> > > LLC-store-misses OR cpu_atom/LLC-store-misses/
> > > dTLB-loads OR cpu_atom/dTLB-loads/
> > > dTLB-load-misses OR cpu_atom/dTLB-load-misses/
> > > dTLB-stores OR cpu_atom/dTLB-stores/
> > > dTLB-store-misses OR cpu_atom/dTLB-store-misses/
> > > iTLB-load-misses OR cpu_atom/iTLB-load-misses/
> > > branch-loads OR cpu_atom/branch-loads/
> > > branch-load-misses OR cpu_atom/branch-load-misses/
> > > L1-dcache-loads OR cpu_core/L1-dcache-loads/
> > > L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
> > > L1-dcache-stores OR cpu_core/L1-dcache-stores/
> > > L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
> > > LLC-loads OR cpu_core/LLC-loads/
> > > LLC-load-misses OR cpu_core/LLC-load-misses/
> > > LLC-stores OR cpu_core/LLC-stores/
> > > LLC-store-misses OR cpu_core/LLC-store-misses/
> > > dTLB-loads OR cpu_core/dTLB-loads/
> > > dTLB-load-misses OR cpu_core/dTLB-load-misses/
> > > dTLB-stores OR cpu_core/dTLB-stores/
> > > dTLB-store-misses OR cpu_core/dTLB-store-misses/
> > > iTLB-load-misses OR cpu_core/iTLB-load-misses/
> > > branch-loads OR cpu_core/branch-loads/
> > > branch-load-misses OR cpu_core/branch-load-misses/
> > > node-loads OR cpu_core/node-loads/
> > > node-load-misses OR cpu_core/node-load-misses/
> > > ```
> > > and after it is:
> > > ```
> > > $ perf list hw
> > >
> > > legacy hardware:
> > > branch-instructions
> > > [Retired branch instructions [This event is an alias of branches].
> > > Unit: cpu_atom]
> > > branch-misses
> > > [Mispredicted branch instructions. Unit: cpu_atom]
> > > branches
> > > [Retired branch instructions [This event is an alias of
> > > branch-instructions]. Unit: cpu_atom]
> > > bus-cycles
> > > [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
> > > cache-misses
> > > [Cache misses. Usually this indicates Last Level Cache misses; this is
> > > intended to be used in conjunction with the
> > > PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
> > > Unit: cpu_atom]
> > > cache-references
> > > [Cache accesses. Usually this indicates Last Level Cache accesses but
> > > this may vary depending on your CPU. This may include prefetches and
> > > coherency messages; again this depends on the design of your CPU.
> > > Unit: cpu_atom]
> > > cpu-cycles
> > > [Total cycles. Be wary of what happens during CPU frequency scaling
> > > [This event is an alias of cycles]. Unit: cpu_atom]
> > > cycles
> > > [Total cycles. Be wary of what happens during CPU frequency scaling
> > > [This event is an alias of cpu-cycles]. Unit: cpu_atom]
> > > instructions
> > > [Retired instructions. Be careful,these can be affected by various
> > > issues,most notably hardware interrupt counts. Unit: cpu_atom]
> > > ref-cycles
> > > [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
> > > branch-instructions
> > > [Retired branch instructions [This event is an alias of branches].
> > > Unit: cpu_core]
> > > branch-misses
> > > [Mispredicted branch instructions. Unit: cpu_core]
> > > branches
> > > [Retired branch instructions [This event is an alias of
> > > branch-instructions]. Unit: cpu_core]
> > > bus-cycles
> > > [Bus cycles,which can be different from total cycles. Unit: cpu_core]
> > > cache-misses
> > > [Cache misses. Usually this indicates Last Level Cache misses; this is
> > > intended to be used in conjunction with the
> > > PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
> > > Unit: cpu_core]
> > > cache-references
> > > [Cache accesses. Usually this indicates Last Level Cache accesses but
> > > this may vary depending on your CPU. This may include prefetches and
> > > coherency messages; again this depends on the design of your CPU.
> > > Unit: cpu_core]
> > > cpu-cycles
> > > [Total cycles. Be wary of what happens during CPU frequency scaling
> > > [This event is an alias of cycles]. Unit: cpu_core]
> > > cycles
> > > [Total cycles. Be wary of what happens during CPU frequency scaling
> > > [This event is an alias of cpu-cycles]. Unit: cpu_core]
> > > instructions
> > > [Retired instructions. Be careful,these can be affected by various
> > > issues,most notably hardware interrupt counts. Unit: cpu_core]
> > > ref-cycles
> > > [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
> > > $ perf list hwcache
> > >
> > > legacy cache:
> > > branch-load-misses
> > > [Branch prediction unit read misses. Unit: cpu_atom]
> > > branch-loads
> > > [Branch prediction unit read accesses. Unit: cpu_atom]
> > > dtlb-load-misses
> > > [Data TLB read misses. Unit: cpu_atom]
> > > dtlb-loads
> > > [Data TLB read accesses. Unit: cpu_atom]
> > > dtlb-store-misses
> > > [Data TLB write misses. Unit: cpu_atom]
> > > dtlb-stores
> > > [Data TLB write accesses. Unit: cpu_atom]
> > > itlb-load-misses
> > > [Instruction TLB read misses. Unit: cpu_atom]
> > > l1-dcache-loads
> > > [Level 1 data cache read accesses. Unit: cpu_atom]
> > > l1-dcache-stores
> > > [Level 1 data cache write accesses. Unit: cpu_atom]
> > > l1-icache-load-misses
> > > [Level 1 instruction cache read misses. Unit: cpu_atom]
> > > l1-icache-loads
> > > [Level 1 instruction cache read accesses. Unit: cpu_atom]
> > > llc-load-misses
> > > [Last level cache read misses. Unit: cpu_atom]
> > > llc-loads
> > > [Last level cache read accesses. Unit: cpu_atom]
> > > llc-store-misses
> > > [Last level cache write misses. Unit: cpu_atom]
> > > llc-stores
> > > [Last level cache write accesses. Unit: cpu_atom]
> > > branch-load-misses
> > > [Branch prediction unit read misses. Unit: cpu_core]
> > > branch-loads
> > > [Branch prediction unit read accesses. Unit: cpu_core]
> > > dtlb-load-misses
> > > [Data TLB read misses. Unit: cpu_core]
> > > dtlb-loads
> > > [Data TLB read accesses. Unit: cpu_core]
> > > dtlb-store-misses
> > > [Data TLB write misses. Unit: cpu_core]
> > > dtlb-stores
> > > [Data TLB write accesses. Unit: cpu_core]
> > > itlb-load-misses
> > > [Instruction TLB read misses. Unit: cpu_core]
> > > l1-dcache-load-misses
> > > [Level 1 data cache read misses. Unit: cpu_core]
> > > l1-dcache-loads
> > > [Level 1 data cache read accesses. Unit: cpu_core]
> > > l1-dcache-stores
> > > [Level 1 data cache write accesses. Unit: cpu_core]
> > > l1-icache-load-misses
> > > [Level 1 instruction cache read misses. Unit: cpu_core]
> > > llc-load-misses
> > > [Last level cache read misses. Unit: cpu_core]
> > > llc-loads
> > > [Last level cache read accesses. Unit: cpu_core]
> > > llc-store-misses
> > > [Last level cache write misses. Unit: cpu_core]
> > > llc-stores
> > > [Last level cache write accesses. Unit: cpu_core]
> > > node-load-misses
> > > [Local memory read misses. Unit: cpu_core]
> > > node-loads
> > > [Local memory read accesses. Unit: cpu_core]
> > > ```
> > >
> > > v6. Fix x86 hybrid mismatched number of evsels for the case a PMU is
> > > specified. Add patches to make failures in the parse-events test
> > > easier to diagnose. Reorder the perf stat default events patch to
> > > come earlier.
> > >
> > > v5. Add patch for retrying default events, fixing regression when
> > > non-root and paranoid. Make cycles to cpu-cycles test event change
> > > (to avoid non-core ARM events) the default on all architectures
> > > (suggested by Namhyung). Switch all non-test cases to specifying a
> > > PMU. Improvements to the parse-events test including core PMU
> > > parsing support for architectures without a "cpu" PMU.
> > > https://lore.kernel.org/lkml/20250923041844.400164-1-irogers@google.com/
> > >
> > > v4: Fixes for matching hard coded metrics in stat-shadow. Make the
> > > default "cycles" event string on ARM "cpu-cycles" which is the
> > > same legacy event but avoids name collisions on ARM PMUs. To
> > > support this, use evlist__new_default for the no command line
> > > event case in `perf record` and `perf top`. Make
> > > evlist__new_default only scan core PMUs.
> > > https://lore.kernel.org/lkml/20250914181121.1952748-1-irogers@google.com/#t
> > >
> > > v3: Deprecate the legacy cache events that aren't shown in the
> > > previous perf list to avoid the perf list output being too verbose.
> > > https://lore.kernel.org/lkml/20250828205930.4007284-1-irogers@google.com/
> > >
> >
> > Hi Ian,
> >
> > Did you drop the change to ignore failures to open events in favour of
> > switching the default from "cycles" to "cpu-cycles" instead? I'm trying
> > to follow the changelog but couldn't see it.
>
> Hi James,
>
> your example is using `perf stat` whilst the behavior to ignore
> failing to open events is for `perf record` and added in:
> https://lore.kernel.org/lkml/20250923223312.238185-6-irogers@google.com/
> Changing the behavior for `perf stat` hasn't been in scope for making
> the behavior of legacy events consistent in any patch series. The
> whole '<not counted>' vs '<not supported>' vs an early exit is
> something where not changing behavior has been the name of the game
> for many years due to potential reliance on the behavior.
>
> Fwiw, this series migrates legacy events to json we should be able to
> do the same with the legacy hard coded metrics in `perf stat` once it
> lands. Removing the hard coded metrics will give better metrics as we
> can use event groups, share events between metrics, .. and only
> display requested metrics when metrics are requested. My point is that
> there is further `perf stat` clean up that this work moves us toward
> but at 28 patches already I'd like not to start work on making perf
> stat a better place. Getting consensus on what that better place is
> given the existing fragmentic behavior will be a challenge, but at
> least these days we have some form of tests, albeit they tend to flake
> all the time. To solve your problems below where you're specifically
> picking an event that can wildcard, give the PMU or use cpu-cycles, or
> one of the many other names ARM already has to mean this. Overloading
> the event 'cycles' wasn't something introduced in this series and the
> fact changing the name hasn't happened in the drivers is really very
> frustrating.
>
> Thanks,
> Ian
>
>
> > In v3 I got <not supported> for the uncore cycles event, but in v6 I get
> > a complete failure:
> >
> > -> sudo perf-v3 stat -e cycles -- true
> >
> > Performance counter stats for 'true':
> >
> > 1732478 cycles
> >
> > <not supported> arm_cmn_0/cycles/
> >
> >
> > -> sudo perf-v6 stat -e cycles -- true
> > Error:
> > Invalid event (cycles) in per-thread mode, enable system wide with '-
> > a'.
> >
> > The verbose output shows that it tries both, but doesn't ignore the
> > error on arm_cmn_0 anymore:
> >
> > -> sudo perf-v6 stat -e cycles -vvv -- true
> > Control descriptor is not initialized
> > Opening: cycles
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 0 (PERF_TYPE_HARDWARE)
> > size 136
> > config 0 (PERF_COUNT_HW_CPU_CYCLES)
> > sample_type IDENTIFIER
> > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > disabled 1
> > inherit 1
> > enable_on_exec 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 9646 cpu -1 group_fd -1 flags 0x8 = 3
> > Opening: cycles
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 11 (arm_cmn_0)
> > size 136
> > Required parameter 'wp_dev_sel' not specified
> > Required parameter 'wp_dev_sel' not specified
> > config 0x3 (cycles)
> > sample_type IDENTIFIER
> > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > disabled 1
> > inherit 1
> > enable_on_exec 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 9646 cpu 0 group_fd -1 flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off exclude_guest for PMU arm_cmn_0
> > Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit,
> > falling back to no-inherit.
> > Warning:
> > cycles event is not supported by the kernel.
> > Error:
> > Invalid event (cycles) in per-thread mode, enable system wide with '-a'.
Re v3 vs v6:
v3 series here has no changes to perf stat:
https://lore.kernel.org/lkml/20250828205930.4007284-1-irogers@google.com/
the v6 does change perf stat in the "add_default_events" function:
https://lore.kernel.org/lkml/20250923223312.238185-5-irogers@google.com/
but you are reporting an issue with an event specified, so not using
default events. The evsel changes in v6 are for the evsel__match
function that isn't used during event opening.
Thanks,
Ian
> >
> > > v2: Additional details to the cover letter. Credit to Vince Weaver
> > > added to the commit message for the event details. Additional
> > > patches to clean up perf_pmu new_alias by removing an unused term
> > > scanner argument and avoid stdio usage.
> > > https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/
> > >
> > > v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/
> > >
> > > Ian Rogers (28):
> > > perf stat: Allow retry for default events
> > > perf parse-events: Fix legacy cache events if event is duplicated in a
> > > PMU
> > > perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> > > perf stat: Avoid wildcarding PMUs for default events
> > > perf record: Skip don't fail for events that don't open
> > > perf jevents: Support copying the source json files to OUTPUT
> > > perf pmu: Don't eagerly parse event terms
> > > perf parse-events: Remove unused FILE input argument to scanner
> > > perf pmu: Use fd rather than FILE from new_alias
> > > perf pmu: Factor term parsing into a perf_event_attr into a helper
> > > perf parse-events: Add terms for legacy hardware and cache config
> > > values
> > > perf jevents: Add legacy json terms and default_core event table
> > > helper
> > > perf pmu: Add and use legacy_terms in alias information
> > > perf jevents: Add legacy-hardware and legacy-cache json
> > > perf print-events: Remove print_hwcache_events
> > > perf print-events: Remove print_symbol_events
> > > perf parse-events: Remove hard coded legacy hardware and cache parsing
> > > perf record: Use evlist__new_default when no events specified
> > > perf top: Use evlist__new_default when no events specified
> > > perf evlist: Avoid scanning all PMUs for evlist__new_default
> > > perf evsel: Improvements to __evsel__match
> > > perf test parse-events: Use evsel__match for legacy events
> > > perf test parse-events: Without a PMU use cpu-cycles rather than
> > > cycles
> > > perf test parse-events: Remove cpu PMU requirement
> > > perf test: Switch cycles event to cpu-cycles
> > > perf test: Clean up test_..config helpers
> > > perf test parse-events: Add evlist test helper
> > > perf test parse-events: Add evsel test helper
> > >
> > > tools/perf/Makefile.perf | 21 +-
> > > tools/perf/arch/x86/util/intel-pt.c | 2 +-
> > > tools/perf/builtin-list.c | 34 +-
> > > tools/perf/builtin-record.c | 97 +-
> > > tools/perf/builtin-stat.c | 171 +-
> > > tools/perf/builtin-top.c | 8 +-
> > > tools/perf/pmu-events/Build | 24 +-
> > > .../arch/common/common/legacy-hardware.json | 72 +
> > > tools/perf/pmu-events/empty-pmu-events.c | 2771 ++++++++++++++++-
> > > tools/perf/pmu-events/jevents.py | 32 +
> > > tools/perf/pmu-events/make_legacy_cache.py | 129 +
> > > tools/perf/pmu-events/pmu-events.h | 1 +
> > > tools/perf/tests/code-reading.c | 2 +-
> > > tools/perf/tests/keep-tracking.c | 2 +-
> > > tools/perf/tests/parse-events.c | 2010 ++++++------
> > > tools/perf/tests/perf-time-to-tsc.c | 4 +-
> > > tools/perf/tests/pmu-events.c | 24 +-
> > > tools/perf/tests/pmu.c | 3 +-
> > > tools/perf/tests/switch-tracking.c | 2 +-
> > > tools/perf/util/evlist.c | 18 +-
> > > tools/perf/util/evsel.c | 21 +-
> > > tools/perf/util/parse-events.c | 282 +-
> > > tools/perf/util/parse-events.h | 22 +-
> > > tools/perf/util/parse-events.l | 54 +-
> > > tools/perf/util/parse-events.y | 114 +-
> > > tools/perf/util/perf_api_probe.c | 27 +-
> > > tools/perf/util/pmu.c | 309 +-
> > > tools/perf/util/print-events.c | 112 -
> > > tools/perf/util/print-events.h | 4 -
> > > 29 files changed, 4523 insertions(+), 1849 deletions(-)
> > > create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
> > > create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
> > >
> >
Powered by blists - more mailing lists