[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMHbIGRFeQlq9ABx@google.com>
Date: Wed, 10 Sep 2025 13:10:08 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Kan Liang <kan.liang@...ux.intel.com>,
James Clark <james.clark@...aro.org>, Xu Yang <xu.yang_2@....com>,
Thomas Falcon <thomas.falcon@...el.com>,
Andi Kleen <ak@...ux.intel.com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, bpf@...r.kernel.org,
Atish Patra <atishp@...osinc.com>,
Beeman Strong <beeman@...osinc.com>, Leo Yan <leo.yan@....com>,
Vince Weaver <vincent.weaver@...ne.edu>
Subject: Re: [PATCH v3 00/15] Legacy hardware/cache events as json
Hi Ian,
On Thu, Aug 28, 2025 at 01:59:15PM -0700, Ian Rogers wrote:
> Mirroring similar work for software events in commit 6e9fa4131abb
> ("perf parse-events: Remove non-json software events"). These changes
> migrate the legacy hardware and cache events to json. With no hard
> coded legacy hardware or cache events the wild card, case
> insensitivity, etc. is consistent for events. This does, however, mean
> events like cycles will wild card against all PMUs. A change doing the
> same was originally posted and merged from:
> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> his dislike for the cycles behavior on ARM with perf record. Earlier
> patches in this series make perf record event opening failures
> non-fatal and hide the cycles event's failure to open on ARM in perf
> record, so it is expected the behavior will now be transparent in perf
> record on ARM. perf stat with a cycles event will wildcard open the
> event on all PMUs.
>
> The change to support legacy events with PMUs was done to clean up
> Intel's hybrid PMU implementation. Having sysfs/json events with
> increased priority to legacy was requested by Mark Rutland
> <mark.rutland@....com> to fix Apple-M PMU issues wrt broken legacy
> events on that PMU. It is believed the PMU driver is now fixed, but
> this has only been confirmed on ARM Juno boards. It was requested that
> RISC-V be able to add events to the perf tool json so the PMU driver
> didn't need to map legacy events to config encodings:
> https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> This patch series achieves this.
>
> A previous series of patches decreasing legacy hardware event
> priorities was posted in:
> https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
> Namhyung Kim <namhyung@...nel.org> mentioned that hardware and
> software events can be implemented similarly:
> https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
> and this patch series achieves this.
Thanks for working on this. Yeah, I think it's be easier to handle all
events consistently with JSON. I expect the sysfs encoding will be used
in a higher priority if it comes with <PMU>/<EVENT>/ format.
>
> Note, patch 1 (perf parse-events: Fix legacy cache events if event is
> duplicated in a PMU) fixes a function deleted by patch 15 (perf
> parse-events: Remove hard coded legacy hardware and cache
> parsing). Adding the json exposed an issue when legacy cache (not
> legacy hardware) and sysfs/json events exist. The fix is necessary to
> keep tests passing through the series. It is also posted for backports
> to stable trees.
Sounds ok.
>
> The perf list behavior includes a lot more information and events. The
> before behavior on a hybrid alderlake is:
> ```
> $ perf list hw
>
> List of pre-defined events (to be used in -e or -M):
>
> branch-instructions OR branches [Hardware event]
> branch-misses [Hardware event]
> bus-cycles [Hardware event]
> cache-misses [Hardware event]
> cache-references [Hardware event]
> cpu-cycles OR cycles [Hardware event]
> instructions [Hardware event]
> ref-cycles [Hardware event]
> $ perf list hwcache
>
> List of pre-defined events (to be used in -e or -M):
>
>
> cache:
> L1-dcache-loads OR cpu_atom/L1-dcache-loads/
> L1-dcache-stores OR cpu_atom/L1-dcache-stores/
> L1-icache-loads OR cpu_atom/L1-icache-loads/
> L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
> LLC-loads OR cpu_atom/LLC-loads/
> LLC-load-misses OR cpu_atom/LLC-load-misses/
> LLC-stores OR cpu_atom/LLC-stores/
> LLC-store-misses OR cpu_atom/LLC-store-misses/
> dTLB-loads OR cpu_atom/dTLB-loads/
> dTLB-load-misses OR cpu_atom/dTLB-load-misses/
> dTLB-stores OR cpu_atom/dTLB-stores/
> dTLB-store-misses OR cpu_atom/dTLB-store-misses/
> iTLB-load-misses OR cpu_atom/iTLB-load-misses/
> branch-loads OR cpu_atom/branch-loads/
> branch-load-misses OR cpu_atom/branch-load-misses/
> L1-dcache-loads OR cpu_core/L1-dcache-loads/
> L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
> L1-dcache-stores OR cpu_core/L1-dcache-stores/
> L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
> LLC-loads OR cpu_core/LLC-loads/
> LLC-load-misses OR cpu_core/LLC-load-misses/
> LLC-stores OR cpu_core/LLC-stores/
> LLC-store-misses OR cpu_core/LLC-store-misses/
> dTLB-loads OR cpu_core/dTLB-loads/
> dTLB-load-misses OR cpu_core/dTLB-load-misses/
> dTLB-stores OR cpu_core/dTLB-stores/
> dTLB-store-misses OR cpu_core/dTLB-store-misses/
> iTLB-load-misses OR cpu_core/iTLB-load-misses/
> branch-loads OR cpu_core/branch-loads/
> branch-load-misses OR cpu_core/branch-load-misses/
> node-loads OR cpu_core/node-loads/
> node-load-misses OR cpu_core/node-load-misses/
> ```
> and after it is:
> ```
> $ perf list hw
>
> legacy hardware:
> branch-instructions
> [Retired branch instructions [This event is an alias of branches].
> Unit: cpu_atom]
> branch-misses
> [Mispredicted branch instructions. Unit: cpu_atom]
> branches
> [Retired branch instructions [This event is an alias of
> branch-instructions]. Unit: cpu_atom]
A nit. Can we have one actual event and an alias of it?
I think 'branch-instructions' will be the actual event and 'branches'
will be the alias. Then the description will be like
branch-instructions
[Retired branch instructions. Unit: cpu_atom]
...
branches
[This event is an alias of branch-instructions.]
The same goes to 'cycles' and 'cpu-cycles'.
Thanks,
Namhyung
> bus-cycles
> [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
> cache-misses
> [Cache misses. Usually this indicates Last Level Cache misses; this is
> intended to be used in conjunction with the
> PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
> Unit: cpu_atom]
> cache-references
> [Cache accesses. Usually this indicates Last Level Cache accesses but
> this may vary depending on your CPU. This may include prefetches and
> coherency messages; again this depends on the design of your CPU.
> Unit: cpu_atom]
> cpu-cycles
> [Total cycles. Be wary of what happens during CPU frequency scaling
> [This event is an alias of cycles]. Unit: cpu_atom]
> cycles
> [Total cycles. Be wary of what happens during CPU frequency scaling
> [This event is an alias of cpu-cycles]. Unit: cpu_atom]
> instructions
> [Retired instructions. Be careful,these can be affected by various
> issues,most notably hardware interrupt counts. Unit: cpu_atom]
> ref-cycles
> [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
> branch-instructions
> [Retired branch instructions [This event is an alias of branches].
> Unit: cpu_core]
> branch-misses
> [Mispredicted branch instructions. Unit: cpu_core]
> branches
> [Retired branch instructions [This event is an alias of
> branch-instructions]. Unit: cpu_core]
> bus-cycles
> [Bus cycles,which can be different from total cycles. Unit: cpu_core]
> cache-misses
> [Cache misses. Usually this indicates Last Level Cache misses; this is
> intended to be used in conjunction with the
> PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
> Unit: cpu_core]
> cache-references
> [Cache accesses. Usually this indicates Last Level Cache accesses but
> this may vary depending on your CPU. This may include prefetches and
> coherency messages; again this depends on the design of your CPU.
> Unit: cpu_core]
> cpu-cycles
> [Total cycles. Be wary of what happens during CPU frequency scaling
> [This event is an alias of cycles]. Unit: cpu_core]
> cycles
> [Total cycles. Be wary of what happens during CPU frequency scaling
> [This event is an alias of cpu-cycles]. Unit: cpu_core]
> instructions
> [Retired instructions. Be careful,these can be affected by various
> issues,most notably hardware interrupt counts. Unit: cpu_core]
> ref-cycles
> [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
> $ perf list hwcache
>
> legacy cache:
> branch-load-misses
> [Branch prediction unit read misses. Unit: cpu_atom]
> branch-loads
> [Branch prediction unit read accesses. Unit: cpu_atom]
> dtlb-load-misses
> [Data TLB read misses. Unit: cpu_atom]
> dtlb-loads
> [Data TLB read accesses. Unit: cpu_atom]
> dtlb-store-misses
> [Data TLB write misses. Unit: cpu_atom]
> dtlb-stores
> [Data TLB write accesses. Unit: cpu_atom]
> itlb-load-misses
> [Instruction TLB read misses. Unit: cpu_atom]
> l1-dcache-loads
> [Level 1 data cache read accesses. Unit: cpu_atom]
> l1-dcache-stores
> [Level 1 data cache write accesses. Unit: cpu_atom]
> l1-icache-load-misses
> [Level 1 instruction cache read misses. Unit: cpu_atom]
> l1-icache-loads
> [Level 1 instruction cache read accesses. Unit: cpu_atom]
> llc-load-misses
> [Last level cache read misses. Unit: cpu_atom]
> llc-loads
> [Last level cache read accesses. Unit: cpu_atom]
> llc-store-misses
> [Last level cache write misses. Unit: cpu_atom]
> llc-stores
> [Last level cache write accesses. Unit: cpu_atom]
> branch-load-misses
> [Branch prediction unit read misses. Unit: cpu_core]
> branch-loads
> [Branch prediction unit read accesses. Unit: cpu_core]
> dtlb-load-misses
> [Data TLB read misses. Unit: cpu_core]
> dtlb-loads
> [Data TLB read accesses. Unit: cpu_core]
> dtlb-store-misses
> [Data TLB write misses. Unit: cpu_core]
> dtlb-stores
> [Data TLB write accesses. Unit: cpu_core]
> itlb-load-misses
> [Instruction TLB read misses. Unit: cpu_core]
> l1-dcache-load-misses
> [Level 1 data cache read misses. Unit: cpu_core]
> l1-dcache-loads
> [Level 1 data cache read accesses. Unit: cpu_core]
> l1-dcache-stores
> [Level 1 data cache write accesses. Unit: cpu_core]
> l1-icache-load-misses
> [Level 1 instruction cache read misses. Unit: cpu_core]
> llc-load-misses
> [Last level cache read misses. Unit: cpu_core]
> llc-loads
> [Last level cache read accesses. Unit: cpu_core]
> llc-store-misses
> [Last level cache write misses. Unit: cpu_core]
> llc-stores
> [Last level cache write accesses. Unit: cpu_core]
> node-load-misses
> [Local memory read misses. Unit: cpu_core]
> node-loads
> [Local memory read accesses. Unit: cpu_core]
> ```
>
> v3: Deprecate the legacy cache events that aren't shown in the
> previous perf list to avoid the perf list output being too verbose.
>
> v2: Additional details to the cover letter. Credit to Vince Weaver
> added to the commit message for the event details. Additional
> patches to clean up perf_pmu new_alias by removing an unused term
> scanner argument and avoid stdio usage.
> https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/
>
> v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/
>
> Ian Rogers (15):
> perf parse-events: Fix legacy cache events if event is duplicated in a
> PMU
> perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> perf record: Skip don't fail for events that don't open
> perf jevents: Support copying the source json files to OUTPUT
> perf pmu: Don't eagerly parse event terms
> perf parse-events: Remove unused FILE input argument to scanner
> perf pmu: Use fd rather than FILE from new_alias
> perf pmu: Factor term parsing into a perf_event_attr into a helper
> perf parse-events: Add terms for legacy hardware and cache config
> values
> perf jevents: Add legacy json terms and default_core event table
> helper
> perf pmu: Add and use legacy_terms in alias information
> perf jevents: Add legacy-hardware and legacy-cache json
> perf print-events: Remove print_hwcache_events
> perf print-events: Remove print_symbol_events
> perf parse-events: Remove hard coded legacy hardware and cache parsing
>
> tools/perf/Makefile.perf | 21 +-
> tools/perf/arch/x86/util/intel-pt.c | 2 +-
> tools/perf/builtin-list.c | 34 +-
> tools/perf/builtin-record.c | 89 +-
> tools/perf/pmu-events/Build | 24 +-
> .../arch/common/common/legacy-hardware.json | 72 +
> tools/perf/pmu-events/empty-pmu-events.c | 2763 ++++++++++++++++-
> tools/perf/pmu-events/jevents.py | 24 +
> tools/perf/pmu-events/make_legacy_cache.py | 129 +
> tools/perf/pmu-events/pmu-events.h | 1 +
> tools/perf/tests/parse-events.c | 2 +-
> tools/perf/tests/pmu-events.c | 24 +-
> tools/perf/tests/pmu.c | 3 +-
> tools/perf/util/parse-events.c | 283 +-
> tools/perf/util/parse-events.h | 16 +-
> tools/perf/util/parse-events.l | 54 +-
> tools/perf/util/parse-events.y | 114 +-
> tools/perf/util/perf_api_probe.c | 27 +-
> tools/perf/util/pmu.c | 302 +-
> tools/perf/util/print-events.c | 112 -
> tools/perf/util/print-events.h | 4 -
> 21 files changed, 3330 insertions(+), 770 deletions(-)
> create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
> create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
>
> --
> 2.51.0.318.gd7df087d1a-goog
>
Powered by blists - more mailing lists