lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca47d298-331d-420c-8c4f-83cd29bae902@linaro.org>
Date: Wed, 1 Oct 2025 14:37:57 +0100
From: James Clark <james.clark@...aro.org>
To: Ian Rogers <irogers@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Arnaldo Carvalho de Melo <acme@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
 Kan Liang <kan.liang@...ux.intel.com>, Xu Yang <xu.yang_2@....com>,
 Thomas Falcon <thomas.falcon@...el.com>, Andi Kleen <ak@...ux.intel.com>,
 linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
 Atish Patra <atishp@...osinc.com>, Beeman Strong <beeman@...osinc.com>,
 Leo Yan <leo.yan@....com>, Vince Weaver <vincent.weaver@...ne.edu>
Subject: Re: [PATCH v6 00/28] Legacy hardware/cache events as json



On 23/09/2025 11:32 pm, Ian Rogers wrote:
> Mirroring similar work for software events in commit 6e9fa4131abb
> ("perf parse-events: Remove non-json software events"). These changes
> migrate the legacy hardware and cache events to json.  With no hard
> coded legacy hardware or cache events the wild card, case
> insensitivity, etc. is consistent for events. This does, however, mean
> events like cycles will wild card against all PMUs. A change doing the
> same was originally posted and merged from:
> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> and reverted by Linus in commit 4f1b067359ac ("Revert "perf
> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
> his dislike for the cycles behavior on ARM with perf record. Earlier
> patches in this series make perf record event opening failures
> non-fatal and hide the cycles event's failure to open on ARM in perf
> record, so it is expected the behavior will now be transparent in perf
> record on ARM. perf stat with a cycles event will wildcard open the
> event on all PMUs, however, with default events the cycles event will
> only be opened on core PMUs.
> 
> The key motivation for these patches is so that if, for example, you
> run `perf stat -e cpu-cycles ...` on a hybrid x86 and the results are
> printed out with "cpu_core/cpu-cycles/" and "cpu_atom/cpu-cycles/",
> the perf_event_attr for cpu-cycles and cpu_core/cpu-cycles/ be the
> same, similarly for the cpu_atom event. Prior to these patches the
> event with a PMU prefers sysfs/json over legacy encodings while with
> no PMU legacy encodings are preferred - these are different encodings
> on x86.
> 
> The change to support legacy events with PMUs was done to clean up
> Intel's hybrid PMU implementation. Having sysfs/json events with
> increased priority to legacy was requested by Mark Rutland
>   <mark.rutland@....com> to fix Apple-M PMU issues wrt broken legacy
> events on that PMU. It is believed the PMU driver is now fixed, but
> this has only been confirmed on ARM Juno boards. It was requested that
> RISC-V be able to add events to the perf tool json so the PMU driver
> didn't need to map legacy events to config encodings:
> https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> This patch series achieves this.
> 
> A previous series of patches decreasing legacy hardware event
> priorities was posted in:
> https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
> Namhyung Kim <namhyung@...nel.org> mentioned that hardware and
> software events can be implemented similarly:
> https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
> and this patch series achieves this.
> 
> Note, patch 2 (perf parse-events: Fix legacy cache events if event is
> duplicated in a PMU) fixes a function deleted by patch 17 (perf
> parse-events: Remove hard coded legacy hardware and cache
> parsing). Adding the json exposed an issue when legacy cache (not
> legacy hardware) and sysfs/json events exist. The fix is necessary to
> keep tests passing through the series. It is also posted for backports
> to stable trees.
> 
> The perf list behavior includes a lot more information and events. The
> before behavior on a hybrid alderlake is:
> ```
> $ perf list hw
> 
> List of pre-defined events (to be used in -e or -M):
> 
>    branch-instructions OR branches                    [Hardware event]
>    branch-misses                                      [Hardware event]
>    bus-cycles                                         [Hardware event]
>    cache-misses                                       [Hardware event]
>    cache-references                                   [Hardware event]
>    cpu-cycles OR cycles                               [Hardware event]
>    instructions                                       [Hardware event]
>    ref-cycles                                         [Hardware event]
> $ perf list hwcache
> 
> List of pre-defined events (to be used in -e or -M):
> 
> 
> cache:
>    L1-dcache-loads OR cpu_atom/L1-dcache-loads/
>    L1-dcache-stores OR cpu_atom/L1-dcache-stores/
>    L1-icache-loads OR cpu_atom/L1-icache-loads/
>    L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
>    LLC-loads OR cpu_atom/LLC-loads/
>    LLC-load-misses OR cpu_atom/LLC-load-misses/
>    LLC-stores OR cpu_atom/LLC-stores/
>    LLC-store-misses OR cpu_atom/LLC-store-misses/
>    dTLB-loads OR cpu_atom/dTLB-loads/
>    dTLB-load-misses OR cpu_atom/dTLB-load-misses/
>    dTLB-stores OR cpu_atom/dTLB-stores/
>    dTLB-store-misses OR cpu_atom/dTLB-store-misses/
>    iTLB-load-misses OR cpu_atom/iTLB-load-misses/
>    branch-loads OR cpu_atom/branch-loads/
>    branch-load-misses OR cpu_atom/branch-load-misses/
>    L1-dcache-loads OR cpu_core/L1-dcache-loads/
>    L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
>    L1-dcache-stores OR cpu_core/L1-dcache-stores/
>    L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
>    LLC-loads OR cpu_core/LLC-loads/
>    LLC-load-misses OR cpu_core/LLC-load-misses/
>    LLC-stores OR cpu_core/LLC-stores/
>    LLC-store-misses OR cpu_core/LLC-store-misses/
>    dTLB-loads OR cpu_core/dTLB-loads/
>    dTLB-load-misses OR cpu_core/dTLB-load-misses/
>    dTLB-stores OR cpu_core/dTLB-stores/
>    dTLB-store-misses OR cpu_core/dTLB-store-misses/
>    iTLB-load-misses OR cpu_core/iTLB-load-misses/
>    branch-loads OR cpu_core/branch-loads/
>    branch-load-misses OR cpu_core/branch-load-misses/
>    node-loads OR cpu_core/node-loads/
>    node-load-misses OR cpu_core/node-load-misses/
> ```
> and after it is:
> ```
> $ perf list hw
> 
> legacy hardware:
>    branch-instructions
>         [Retired branch instructions [This event is an alias of branches].
>          Unit: cpu_atom]
>    branch-misses
>         [Mispredicted branch instructions. Unit: cpu_atom]
>    branches
>         [Retired branch instructions [This event is an alias of
>          branch-instructions]. Unit: cpu_atom]
>    bus-cycles
>         [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
>    cache-misses
>         [Cache misses. Usually this indicates Last Level Cache misses; this is
>          intended to be used in conjunction with the
>          PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
>          Unit: cpu_atom]
>    cache-references
>         [Cache accesses. Usually this indicates Last Level Cache accesses but
>          this may vary depending on your CPU. This may include prefetches and
>          coherency messages; again this depends on the design of your CPU.
>          Unit: cpu_atom]
>    cpu-cycles
>         [Total cycles. Be wary of what happens during CPU frequency scaling
>          [This event is an alias of cycles]. Unit: cpu_atom]
>    cycles
>         [Total cycles. Be wary of what happens during CPU frequency scaling
>          [This event is an alias of cpu-cycles]. Unit: cpu_atom]
>    instructions
>         [Retired instructions. Be careful,these can be affected by various
>          issues,most notably hardware interrupt counts. Unit: cpu_atom]
>    ref-cycles
>         [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
>    branch-instructions
>         [Retired branch instructions [This event is an alias of branches].
>          Unit: cpu_core]
>    branch-misses
>         [Mispredicted branch instructions. Unit: cpu_core]
>    branches
>         [Retired branch instructions [This event is an alias of
>          branch-instructions]. Unit: cpu_core]
>    bus-cycles
>         [Bus cycles,which can be different from total cycles. Unit: cpu_core]
>    cache-misses
>         [Cache misses. Usually this indicates Last Level Cache misses; this is
>          intended to be used in conjunction with the
>          PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
>          Unit: cpu_core]
>    cache-references
>         [Cache accesses. Usually this indicates Last Level Cache accesses but
>          this may vary depending on your CPU. This may include prefetches and
>          coherency messages; again this depends on the design of your CPU.
>          Unit: cpu_core]
>    cpu-cycles
>         [Total cycles. Be wary of what happens during CPU frequency scaling
>          [This event is an alias of cycles]. Unit: cpu_core]
>    cycles
>         [Total cycles. Be wary of what happens during CPU frequency scaling
>          [This event is an alias of cpu-cycles]. Unit: cpu_core]
>    instructions
>         [Retired instructions. Be careful,these can be affected by various
>          issues,most notably hardware interrupt counts. Unit: cpu_core]
>    ref-cycles
>         [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
> $ perf list hwcache
> 
> legacy cache:
>    branch-load-misses
>         [Branch prediction unit read misses. Unit: cpu_atom]
>    branch-loads
>         [Branch prediction unit read accesses. Unit: cpu_atom]
>    dtlb-load-misses
>         [Data TLB read misses. Unit: cpu_atom]
>    dtlb-loads
>         [Data TLB read accesses. Unit: cpu_atom]
>    dtlb-store-misses
>         [Data TLB write misses. Unit: cpu_atom]
>    dtlb-stores
>         [Data TLB write accesses. Unit: cpu_atom]
>    itlb-load-misses
>         [Instruction TLB read misses. Unit: cpu_atom]
>    l1-dcache-loads
>         [Level 1 data cache read accesses. Unit: cpu_atom]
>    l1-dcache-stores
>         [Level 1 data cache write accesses. Unit: cpu_atom]
>    l1-icache-load-misses
>         [Level 1 instruction cache read misses. Unit: cpu_atom]
>    l1-icache-loads
>         [Level 1 instruction cache read accesses. Unit: cpu_atom]
>    llc-load-misses
>         [Last level cache read misses. Unit: cpu_atom]
>    llc-loads
>         [Last level cache read accesses. Unit: cpu_atom]
>    llc-store-misses
>         [Last level cache write misses. Unit: cpu_atom]
>    llc-stores
>         [Last level cache write accesses. Unit: cpu_atom]
>    branch-load-misses
>         [Branch prediction unit read misses. Unit: cpu_core]
>    branch-loads
>         [Branch prediction unit read accesses. Unit: cpu_core]
>    dtlb-load-misses
>         [Data TLB read misses. Unit: cpu_core]
>    dtlb-loads
>         [Data TLB read accesses. Unit: cpu_core]
>    dtlb-store-misses
>         [Data TLB write misses. Unit: cpu_core]
>    dtlb-stores
>         [Data TLB write accesses. Unit: cpu_core]
>    itlb-load-misses
>         [Instruction TLB read misses. Unit: cpu_core]
>    l1-dcache-load-misses
>         [Level 1 data cache read misses. Unit: cpu_core]
>    l1-dcache-loads
>         [Level 1 data cache read accesses. Unit: cpu_core]
>    l1-dcache-stores
>         [Level 1 data cache write accesses. Unit: cpu_core]
>    l1-icache-load-misses
>         [Level 1 instruction cache read misses. Unit: cpu_core]
>    llc-load-misses
>         [Last level cache read misses. Unit: cpu_core]
>    llc-loads
>         [Last level cache read accesses. Unit: cpu_core]
>    llc-store-misses
>         [Last level cache write misses. Unit: cpu_core]
>    llc-stores
>         [Last level cache write accesses. Unit: cpu_core]
>    node-load-misses
>         [Local memory read misses. Unit: cpu_core]
>    node-loads
>         [Local memory read accesses. Unit: cpu_core]
> ```
> 
> v6. Fix x86 hybrid mismatched number of evsels for the case a PMU is
>      specified. Add patches to make failures in the parse-events test
>      easier to diagnose. Reorder the perf stat default events patch to
>      come earlier.
> 
> v5. Add patch for retrying default events, fixing regression when
>      non-root and paranoid. Make cycles to cpu-cycles test event change
>      (to avoid non-core ARM events) the default on all architectures
>      (suggested by Namhyung). Switch all non-test cases to specifying a
>      PMU. Improvements to the parse-events test including core PMU
>      parsing support for architectures without a "cpu" PMU.
>      https://lore.kernel.org/lkml/20250923041844.400164-1-irogers@google.com/
> 
> v4: Fixes for matching hard coded metrics in stat-shadow. Make the
>      default "cycles" event string on ARM "cpu-cycles" which is the
>      same legacy event but avoids name collisions on ARM PMUs. To
>      support this, use evlist__new_default for the no command line
>      event case in `perf record` and `perf top`. Make
>      evlist__new_default only scan core PMUs.
>      https://lore.kernel.org/lkml/20250914181121.1952748-1-irogers@google.com/#t
> 
> v3: Deprecate the legacy cache events that aren't shown in the
>      previous perf list to avoid the perf list output being too verbose.
>      https://lore.kernel.org/lkml/20250828205930.4007284-1-irogers@google.com/
> 

Hi Ian,

Did you drop the change to ignore failures to open events in favour of 
switching the default from "cycles" to "cpu-cycles" instead? I'm trying 
to follow the changelog but couldn't see it.

In v3 I got <not supported> for the uncore cycles event, but in v6 I get 
a complete failure:

  -> sudo perf-v3 stat -e cycles -- true

  Performance counter stats for 'true':

            1732478      cycles 

    <not supported>      arm_cmn_0/cycles/


  -> sudo perf-v6 stat -e cycles -- true
  Error:
  Invalid event (cycles) in per-thread mode, enable system wide with '-
  a'.

The verbose output shows that it tries both, but doesn't ignore the 
error on arm_cmn_0 anymore:

-> sudo perf-v6 stat -e cycles -vvv -- true
Control descriptor is not initialized
Opening: cycles
------------------------------------------------------------
perf_event_attr:
   type                             0 (PERF_TYPE_HARDWARE)
   size                             136
   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
   sample_type                      IDENTIFIER
   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
   disabled                         1
   inherit                          1
   enable_on_exec                   1
------------------------------------------------------------
sys_perf_event_open: pid 9646  cpu -1  group_fd -1  flags 0x8 = 3
Opening: cycles
------------------------------------------------------------
perf_event_attr:
   type                             11 (arm_cmn_0)
   size                             136
Required parameter 'wp_dev_sel' not specified
Required parameter 'wp_dev_sel' not specified
   config                           0x3 (cycles)
   sample_type                      IDENTIFIER
   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
   disabled                         1
   inherit                          1
   enable_on_exec                   1
------------------------------------------------------------
sys_perf_event_open: pid 9646  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open failed, error -22
switching off exclude_guest for PMU arm_cmn_0
Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit, 
falling back to no-inherit.
Warning:
cycles event is not supported by the kernel.
Error:
Invalid event (cycles) in per-thread mode, enable system wide with '-a'.


> v2: Additional details to the cover letter. Credit to Vince Weaver
>      added to the commit message for the event details. Additional
>      patches to clean up perf_pmu new_alias by removing an unused term
>      scanner argument and avoid stdio usage.
>      https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/
> 
> v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/
> 
> Ian Rogers (28):
>    perf stat: Allow retry for default events
>    perf parse-events: Fix legacy cache events if event is duplicated in a
>      PMU
>    perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
>    perf stat: Avoid wildcarding PMUs for default events
>    perf record: Skip don't fail for events that don't open
>    perf jevents: Support copying the source json files to OUTPUT
>    perf pmu: Don't eagerly parse event terms
>    perf parse-events: Remove unused FILE input argument to scanner
>    perf pmu: Use fd rather than FILE from new_alias
>    perf pmu: Factor term parsing into a perf_event_attr into a helper
>    perf parse-events: Add terms for legacy hardware and cache config
>      values
>    perf jevents: Add legacy json terms and default_core event table
>      helper
>    perf pmu: Add and use legacy_terms in alias information
>    perf jevents: Add legacy-hardware and legacy-cache json
>    perf print-events: Remove print_hwcache_events
>    perf print-events: Remove print_symbol_events
>    perf parse-events: Remove hard coded legacy hardware and cache parsing
>    perf record: Use evlist__new_default when no events specified
>    perf top: Use evlist__new_default when no events specified
>    perf evlist: Avoid scanning all PMUs for evlist__new_default
>    perf evsel: Improvements to __evsel__match
>    perf test parse-events: Use evsel__match for legacy events
>    perf test parse-events: Without a PMU use cpu-cycles rather than
>      cycles
>    perf test parse-events: Remove cpu PMU requirement
>    perf test: Switch cycles event to cpu-cycles
>    perf test: Clean up test_..config helpers
>    perf test parse-events: Add evlist test helper
>    perf test parse-events: Add evsel test helper
> 
>   tools/perf/Makefile.perf                      |   21 +-
>   tools/perf/arch/x86/util/intel-pt.c           |    2 +-
>   tools/perf/builtin-list.c                     |   34 +-
>   tools/perf/builtin-record.c                   |   97 +-
>   tools/perf/builtin-stat.c                     |  171 +-
>   tools/perf/builtin-top.c                      |    8 +-
>   tools/perf/pmu-events/Build                   |   24 +-
>   .../arch/common/common/legacy-hardware.json   |   72 +
>   tools/perf/pmu-events/empty-pmu-events.c      | 2771 ++++++++++++++++-
>   tools/perf/pmu-events/jevents.py              |   32 +
>   tools/perf/pmu-events/make_legacy_cache.py    |  129 +
>   tools/perf/pmu-events/pmu-events.h            |    1 +
>   tools/perf/tests/code-reading.c               |    2 +-
>   tools/perf/tests/keep-tracking.c              |    2 +-
>   tools/perf/tests/parse-events.c               | 2010 ++++++------
>   tools/perf/tests/perf-time-to-tsc.c           |    4 +-
>   tools/perf/tests/pmu-events.c                 |   24 +-
>   tools/perf/tests/pmu.c                        |    3 +-
>   tools/perf/tests/switch-tracking.c            |    2 +-
>   tools/perf/util/evlist.c                      |   18 +-
>   tools/perf/util/evsel.c                       |   21 +-
>   tools/perf/util/parse-events.c                |  282 +-
>   tools/perf/util/parse-events.h                |   22 +-
>   tools/perf/util/parse-events.l                |   54 +-
>   tools/perf/util/parse-events.y                |  114 +-
>   tools/perf/util/perf_api_probe.c              |   27 +-
>   tools/perf/util/pmu.c                         |  309 +-
>   tools/perf/util/print-events.c                |  112 -
>   tools/perf/util/print-events.h                |    4 -
>   29 files changed, 4523 insertions(+), 1849 deletions(-)
>   create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
>   create mode 100755 tools/perf/pmu-events/make_legacy_cache.py
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ