[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230426070050.1315519-1-irogers@google.com>
Date: Wed, 26 Apr 2023 00:00:10 -0700
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>,
Kan Liang <kan.liang@...ux.intel.com>,
Ahmad Yasin <ahmad.yasin@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Stephane Eranian <eranian@...gle.com>,
Andi Kleen <ak@...ux.intel.com>,
Perry Taylor <perry.taylor@...el.com>,
Samantha Alt <samantha.alt@...el.com>,
Caleb Biggers <caleb.biggers@...el.com>,
Weilin Wang <weilin.wang@...el.com>,
Edward Baker <edward.baker@...el.com>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Florian Fischer <florian.fischer@...q.space>,
Rob Herring <robh@...nel.org>,
Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
John Garry <john.g.garry@...cle.com>,
Kajol Jain <kjain@...ux.ibm.com>,
Sumanth Korikkar <sumanthk@...ux.ibm.com>,
Thomas Richter <tmricht@...ux.ibm.com>,
Tiezhu Yang <yangtiezhu@...ngson.cn>,
Ravi Bangoria <ravi.bangoria@....com>,
Leo Yan <leo.yan@...aro.org>,
Yang Jihong <yangjihong1@...wei.com>,
James Clark <james.clark@....com>,
Suzuki Poulouse <suzuki.poulose@....com>,
Kang Minchul <tegongkang@...il.com>,
Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Ian Rogers <irogers@...gle.com>
Subject: [PATCH v1 00/40] Fix perf on Intel hybrid CPUs
TL;DR: hybrid doesn't crash, json metrics work on hybrid on both PMUs
or individually, event parsing doesn't always scan all PMUs, more and
new tests that also run without hybrid, less code.
The first patches were previously posted to improve metrics here:
"perf stat: Introduce skippable evsels"
https://lore.kernel.org/all/20230414051922.3625666-1-irogers@google.com/
"perf vendor events intel: Add xxx metric constraints"
https://lore.kernel.org/all/20230419005423.343862-1-irogers@google.com/
Next are some general test improvements.
Next event parsing is rewritten to not scan all PMUs for the benefit
of raw and legacy cache parsing, instead these are handled by the
lexer and a new term type. This ultimately removes the need for the
event parser for hybrid to be recursive as legacy cache can be just a
term. Tests are re-enabled for events with hyphens, so AMD's
branch-brs event is now parsable.
The cputype option is made a generic pmu filter flag and is tested
even on non-hybrid systems.
The final patches address specific json metric issues on hybrid, in
both the json metrics and the metric code. They also bring in a new
json option to not group events when matching a metricgroup, this
helps reduce counter pressure for TopdownL1 and TopdownL2 metric
groups. The updates to the script that updates the json are posted in:
https://github.com/intel/perfmon/pull/73
The patches add slightly more code than they remove, in areas like
better json metric constraints and tests, but in the core util code,
the removal of hybrid is a net reduction:
20 files changed, 631 insertions(+), 951 deletions(-)
There's specific detail with each patch, but for now here is the 6.3
output followed by that from perf-tools-next with the patch series
applied. The tool is running on an Alderlake CPU on an elderly 5.15
kernel:
Events on hybrid that parse and pass tests:
'''
$ perf-6.3 version
perf version 6.3.rc7.gb7bc77e2f2c7
$ perf-6.3 test
...
6.1: Test event parsing : FAILED!
...
$ perf test
...
6: Parse event definition strings :
6.1: Test event parsing : Ok
6.2: Parsing of all PMU events from sysfs : Ok
6.3: Parsing of given PMU events from sysfs : Ok
6.4: Parsing of aliased events from sysfs : Skip (no aliases in sysfs)
6.5: Parsing of aliased events : Ok
6.6: Parsing of terms (event modifiers) : Ok
...
'''
No event/metric running with json metrics and TopdownL1 on both PMUs:
'''
$ perf-6.3 stat -a sleep 1
Performance counter stats for 'system wide':
24,073.58 msec cpu-clock # 23.975 CPUs utilized
350 context-switches # 14.539 /sec
25 cpu-migrations # 1.038 /sec
66 page-faults # 2.742 /sec
21,257,199 cpu_core/cycles/ # 883.009 K/sec
2,162,192 cpu_atom/cycles/ # 89.816 K/sec
6,679,379 cpu_core/instructions/ # 277.457 K/sec
753,197 cpu_atom/instructions/ # 31.287 K/sec
1,300,647 cpu_core/branches/ # 54.028 K/sec
148,652 cpu_atom/branches/ # 6.175 K/sec
117,429 cpu_core/branch-misses/ # 4.878 K/sec
14,396 cpu_atom/branch-misses/ # 598.000 /sec
123,097,644 cpu_core/slots/ # 5.113 M/sec
9,241,207 cpu_core/topdown-retiring/ # 7.5% Retiring
8,903,288 cpu_core/topdown-bad-spec/ # 7.2% Bad Speculation
66,590,029 cpu_core/topdown-fe-bound/ # 54.1% Frontend Bound
38,397,500 cpu_core/topdown-be-bound/ # 31.2% Backend Bound
3,294,283 cpu_core/topdown-heavy-ops/ # 2.7% Heavy Operations # 4.8% Light Operations
8,855,769 cpu_core/topdown-br-mispredict/ # 7.2% Branch Mispredict # 0.0% Machine Clears
57,695,714 cpu_core/topdown-fetch-lat/ # 46.9% Fetch Latency # 7.2% Fetch Bandwidth
12,823,926 cpu_core/topdown-mem-bound/ # 10.4% Memory Bound # 20.8% Core Bound
1.004093622 seconds time elapsed
$ perf stat -a sleep 1
Performance counter stats for 'system wide':
24,064.65 msec cpu-clock # 23.973 CPUs utilized
384 context-switches # 15.957 /sec
24 cpu-migrations # 0.997 /sec
71 page-faults # 2.950 /sec
19,737,646 cpu_core/cycles/ # 820.192 K/sec
122,018,505 cpu_atom/cycles/ # 5.070 M/sec (63.32%)
7,636,653 cpu_core/instructions/ # 317.339 K/sec
16,266,629 cpu_atom/instructions/ # 675.955 K/sec (72.50%)
1,552,995 cpu_core/branches/ # 64.534 K/sec
3,208,143 cpu_atom/branches/ # 133.314 K/sec (72.50%)
132,151 cpu_core/branch-misses/ # 5.491 K/sec
547,285 cpu_atom/branch-misses/ # 22.742 K/sec (72.49%)
32,110,597 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.334 M/sec
# 18.4 % tma_bad_speculation (72.48%)
228,006,765 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.475 M/sec
# 38.1 % tma_frontend_bound (72.47%)
225,866,251 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.386 M/sec
# 37.7 % tma_backend_bound
# 37.7 % tma_backend_bound_aux (72.73%)
119,748,254 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 4.976 M/sec
# 5.2 % tma_retiring (73.14%)
31,363,579 cpu_atom/TOPDOWN_RETIRING.ALL/ # 1.303 M/sec (73.37%)
227,907,321 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 9.471 M/sec (63.95%)
228,803,268 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 9.508 M/sec (63.55%)
113,357,334 cpu_core/TOPDOWN.SLOTS/ # 30.5 % tma_backend_bound
# 9.2 % tma_retiring
# 8.7 % tma_bad_speculation
# 51.6 % tma_frontend_bound
10,451,044 cpu_core/topdown-retiring/
9,687,449 cpu_core/topdown-bad-spec/
58,703,214 cpu_core/topdown-fe-bound/
34,540,660 cpu_core/topdown-be-bound/
154,902 cpu_core/INT_MISC.UOP_DROPPING/ # 6.437 K/sec
1.003818397 seconds time elapsed
'''
Json metrics that don't crash:
'''
$ perf-6.3 stat -M TopdownL1 -a sleep 1
WARNING: events in group from different hybrid PMUs!
WARNING: grouped events cpus do not match, disabling group:
anon group { topdown-retiring, topdown-retiring, INT_MISC.UOP_DROPPING, topdown-fe-bound, topdown-fe-bound, CPU_CLK_UNHALTED.CORE, topdown-be-bound, topdown-be-bound, topdown-bad-spec, topdown-bad-spec }
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring).
/bin/dmesg | grep -i perf may provide additional information.
$ perf stat -M TopdownL1 -a sleep 1
Performance counter stats for 'system wide':
811,810 cpu_atom/TOPDOWN_RETIRING.ALL/ # 26.6 % tma_bad_speculation
3,239,281 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.8 % tma_frontend_bound
2,037,667 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 24.4 % tma_backend_bound
# 24.4 % tma_backend_bound_aux
1,670,438 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 9.7 % tma_retiring
808,138 cpu_atom/TOPDOWN_RETIRING.ALL/
3,234,707 cpu_atom/TOPDOWN_FE_BOUND.ALL/
2,081,420 cpu_atom/TOPDOWN_BE_BOUND.ALL/
122,795,280 cpu_core/TOPDOWN.SLOTS/ # 31.7 % tma_backend_bound
# 7.0 % tma_bad_speculation
# 54.1 % tma_frontend_bound
# 7.2 % tma_retiring
8,817,636 cpu_core/topdown-retiring/
8,480,817 cpu_core/topdown-bad-spec/
3,108,926 cpu_core/topdown-heavy-ops/
66,566,215 cpu_core/topdown-fe-bound/
38,958,811 cpu_core/topdown-be-bound/
134,194 cpu_core/INT_MISC.UOP_DROPPING/
1.003607796 seconds time elapsed
$ perf stat -M TopdownL2 -a sleep 1
Performance counter stats for 'system wide':
162,334,218 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_LATENCY/ # 27.7 % tma_fetch_latency (38.99%)
16,191,486 cpu_atom/INST_RETIRED.ANY/ (45.76%)
68,443,205 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 32.2 % tma_memory_bound
# 5.8 % tma_core_bound (45.77%)
14,920,109 cpu_atom/UOPS_RETIRED.MS/ # 2.9 % tma_base (45.92%)
14,829,879 cpu_atom/UOPS_RETIRED.MS/ # 2.5 % tma_ms_uops (46.31%)
31,860,520 cpu_atom/TOPDOWN_RETIRING.ALL/ (46.71%)
117,323,055 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 18.7 % tma_branch_mispredicts
# 11.5 % tma_fetch_bandwidth
# 0.3 % tma_machine_clears
# 37.9 % tma_resource_bound (53.49%)
222,579,768 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (53.90%)
13,672,174 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (54.23%)
24,264,262 cpu_atom/LD_HEAD.ANY_AT_RET/ (47.46%)
13,872,813 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (47.45%)
223,722,007 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (47.31%)
2,005,972 cpu_atom/TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS/ (46.91%)
109,423,013 cpu_atom/TOPDOWN_BAD_SPECULATION.MISPREDICT/ (39.72%)
67,420,790 cpu_atom/TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH/ (39.33%)
92,790,312 cpu_core/TOPDOWN.SLOTS/ # 24.3 % tma_core_bound
# 3.0 % tma_heavy_operations
# 5.6 % tma_light_operations
# 10.8 % tma_memory_bound
# 7.8 % tma_branch_mispredicts
# 40.4 % tma_fetch_latency
# 0.2 % tma_machine_clears
# 7.8 % tma_fetch_bandwidth
8,041,595 cpu_core/topdown-retiring/
10,060,500 cpu_core/topdown-mem-bound/
7,314,344 cpu_core/topdown-bad-spec/
2,824,600 cpu_core/topdown-heavy-ops/
37,630,164 cpu_core/topdown-fetch-lat/
7,278,843 cpu_core/topdown-br-mispredict/
44,863,148 cpu_core/topdown-fe-bound/
32,573,458 cpu_core/topdown-be-bound/
5,785,074 cpu_core/INST_RETIRED.ANY/
2,325,424 cpu_core/UOPS_RETIRED.MS/
15,972,774 cpu_core/CPU_CLK_UNHALTED.THREAD/
117,750 cpu_core/INT_MISC.UOP_DROPPING/
1.003519749 seconds time elapsed
'''
Note, flags are added below to reduce the size of the output by
removing event groups and threshold printing support:
'''
$ perf stat --metric-no-threshold --metric-no-group -M TopdownL3 -a sleep 1
Performance counter stats for 'system wide':
3,506,641 cpu_atom/TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS/ # 0.6 % tma_alloc_restriction (17.14%)
133,962,390 cpu_atom/TOPDOWN_BE_BOUND.SERIALIZATION/ # 22.2 % tma_serialization (17.48%)
11,201,207 cpu_atom/TOPDOWN_FE_BOUND.ITLB/ # 1.9 % tma_itlb_misses (17.88%)
63,876,838 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 10.6 % tma_mem_scheduler
# 10.5 % tma_store_bound
# 2.4 % tma_other_load_store (18.28%)
14,386,940 cpu_atom/UOPS_RETIRED.MS/ (18.68%)
14,432,493 cpu_atom/UOPS_RETIRED.MS/ # 2.7 % tma_other_ret (19.09%)
81,582,687 cpu_atom/TOPDOWN_FE_BOUND.ICACHE/ # 13.5 % tma_icache_misses (19.14%)
30,467,546 cpu_atom/TOPDOWN_RETIRING.ALL/ (19.14%)
16,788,753 cpu_atom/MEM_BOUND_STALLS.LOAD/ # 4.2 % tma_dram_bound
# 3.7 % tma_l2_bound
# 6.7 % tma_l3_bound (19.14%)
14,514,040 cpu_atom/TOPDOWN_FE_BOUND.DECODE/ # 2.4 % tma_decode (19.14%)
688,307 cpu_atom/TOPDOWN_BAD_SPECULATION.NUKE/ # 0.1 % tma_nuke (19.13%)
0 cpu_atom/UOPS_RETIRED.FPDIV/ (19.12%)
4,408,466 cpu_atom/MEM_BOUND_STALLS.LOAD_L2_HIT/ (19.12%)
120,556,998 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 9.3 % tma_branch_detect
# 1.0 % tma_branch_resteer
# 5.8 % tma_cisc
# 0.3 % tma_fast_nuke
# 0.0 % tma_fpdiv_uops
# 4.3 % tma_l1_bound
# 3.2 % tma_non_mem_scheduler
# 1.9 % tma_other_fb
# 1.1 % tma_predecode
# 0.1 % tma_register
# 0.1 % tma_reorder_buffer (22.30%)
34,773,106 cpu_atom/TOPDOWN_FE_BOUND.CISC/ (22.30%)
591,112 cpu_atom/TOPDOWN_BE_BOUND.REGISTER/ (22.30%)
11,286,706 cpu_atom/TOPDOWN_FE_BOUND.OTHER/ (22.30%)
5,082,636 cpu_atom/MEM_BOUND_STALLS.LOAD_DRAM_HIT/ (22.30%)
14,146,185 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (22.31%)
55,833,686 cpu_atom/TOPDOWN_FE_BOUND.BRANCH_DETECT/ (22.30%)
25,714,051 cpu_atom/LD_HEAD.ANY_AT_RET/ (19.12%)
456,549 cpu_atom/TOPDOWN_BE_BOUND.REORDER_BUFFER/ (19.12%)
1,616,862 cpu_atom/TOPDOWN_BAD_SPECULATION.FASTNUKE/ (19.12%)
6,680,782 cpu_atom/TOPDOWN_FE_BOUND.PREDECODE/ (19.12%)
14,229,195 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (19.12%)
8,128,921 cpu_atom/MEM_BOUND_STALLS.LOAD_LLC_HIT/ (19.12%)
20,941,725 cpu_atom/LD_HEAD.L1_MISS_AT_RET/ (19.11%)
6,177,125 cpu_atom/TOPDOWN_FE_BOUND.BRANCH_RESTEER/ (18.78%)
228,066,346 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (18.38%)
5,204,897 cpu_atom/LD_HEAD.L1_BOUND_AT_RET/ (17.99%)
19,060,104 cpu_atom/TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER/ (17.58%)
0 cpu_atom/UOPS_RETIRED.FPDIV/ (17.19%)
864,565,692 cpu_core/TOPDOWN.SLOTS/ # 4.7 % tma_microcode_sequencer
# 0.4 % tma_few_uops_instructions
# 0.3 % tma_fused_instructions
# 1.8 % tma_memory_operations
# 0.1 % tma_nop_instructions
# 8.9 % tma_ms_switches
# 0.4 % tma_non_fused_branches
# 0.0 % tma_fp_arith
# 0.0 % tma_int_operations
# 35.7 % tma_ports_utilization
# 3.8 % tma_other_light_ops (18.03%)
100,519,954 cpu_core/topdown-retiring/ (18.03%)
68,964,454 cpu_core/topdown-bad-spec/ (18.03%)
44,732,021 cpu_core/topdown-heavy-ops/ (18.03%)
435,618,316 cpu_core/topdown-fe-bound/ (18.03%)
262,842,804 cpu_core/topdown-be-bound/ (18.03%)
10,368,608 cpu_core/BR_INST_RETIRED.ALL_BRANCHES/ (18.43%)
55,947,727 cpu_core/RESOURCE_STALLS.SCOREBOARD/ (18.84%)
125,718,255 cpu_core/UOPS_ISSUED.ANY/ (19.24%)
23,178,652 cpu_core/EXE_ACTIVITY.1_PORTS_UTIL/ (19.65%)
0 cpu_core/INT_VEC_RETIRED.ADD_256/ (20.05%)
1,119,514 cpu_core/DSB2MITE_SWITCHES.PENALTY_CYCLES/ # 0.5 % tma_dsb_switches (20.46%)
27,684,795 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/ # 10.6 % tma_l1_bound
# 0.7 % tma_l2_bound (20.86%)
108,813,079 cpu_core/UOPS_EXECUTED.THREAD/ (21.27%)
16,563,036 cpu_core/IDQ.MITE_CYCLES_ANY/ # 5.2 % tma_mite (19.14%)
53,037,471 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (19.14%)
41,005,510 cpu_core/UOPS_RETIRED.MS/ (19.14%)
575,534 cpu_core/ARITH.DIV_ACTIVE/ # 0.2 % tma_divider (19.14%)
0 cpu_core/FP_ARITH_INST_RETIRED.SCALAR_SINGLE,umask=0x03/ (19.14%)
2,207,021 cpu_core/EXE_ACTIVITY.BOUND_ON_STORES/ # 0.9 % tma_store_bound (19.13%)
5,685,032 cpu_core/UOPS_RETIRED.MS,cmask=1,edge/ (19.13%)
25,523 cpu_core/DECODE.LCP/ # 0.0 % tma_lcp (19.12%)
26,095,298 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/ # 10.8 % tma_l3_bound (19.13%)
108,516 cpu_core/MEMORY_ACTIVITY.STALLS_L3_MISS/ # 0.0 % tma_dram_bound (19.13%)
192,239,590 cpu_core/CYCLE_ACTIVITY.STALLS_TOTAL/ (19.12%)
5,978 cpu_core/LSD.CYCLES_ACTIVE/ # -0.0 % tma_lsd (19.12%)
0 cpu_core/INT_VEC_RETIRED.VNNI_128/ (19.13%)
137,530,949 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 0.1 % tma_dsb (19.12%)
240,070,549 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 17.5 % tma_icache_misses
# 6.1 % tma_itlb_misses
# 40.3 % tma_branch_resteers (21.52%)
0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,umask=0x3c/ (21.51%)
595,051 cpu_core/ARITH.DIV_ACTIVE/ (21.52%)
461,041 cpu_core/IDQ.DSB_CYCLES_ANY/ (21.51%)
0 cpu_core/INT_VEC_RETIRED.MUL_256/ (21.52%)
0 cpu_core/UOPS_EXECUTED.X87/ (21.52%)
237,196 cpu_core/IDQ.DSB_CYCLES_OK/ (21.52%)
125,009 cpu_core/LSD.CYCLES_OK/ (21.52%)
0 cpu_core/INT_VEC_RETIRED.ADD_128/ (21.40%)
28,388,778 cpu_core/MEM_UOP_RETIRED.ANY/ (18.61%)
1,806,629 cpu_core/INST_RETIRED.NOP/ (18.21%)
41,928,018 cpu_core/ICACHE_DATA.STALLS/ (17.81%)
0 cpu_core/INT_VEC_RETIRED.VNNI_256/ (17.41%)
18,230,137 cpu_core/EXE_ACTIVITY.2_PORTS_UTIL,umask=0xc/ (17.02%)
28,052,001 cpu_core/EXE_ACTIVITY.3_PORTS_UTIL,umask=0x80/ (16.61%)
4,073,568 cpu_core/INST_RETIRED.MACRO_FUSED/ (16.20%)
66,509,871 cpu_core/INT_MISC.UNKNOWN_BRANCH_CYCLES/ (15.92%)
2,307,447 cpu_core/IDQ.MITE_CYCLES_OK/ (15.91%)
30,345,769 cpu_core/INT_MISC.CLEAR_RESTEER_CYCLES/ (15.91%)
0 cpu_core/INT_VEC_RETIRED.SHUFFLES/ (15.91%)
14,722,079 cpu_core/ICACHE_TAG.STALLS/ (15.90%)
1.004474469 seconds time elapsed
$ perf stat --metric-no-threshold --metric-no-group -M TopdownL4 -a sleep 1
Performance counter stats for 'system wide':
1,004,834,399 ns duration_time # 0.3 % tma_false_sharing
# 40.2 % tma_l3_hit_latency
# 4.4 % tma_contested_accesses
# 1.6 % tma_data_sharing
3,762,410 cpu_atom/LD_HEAD.PGWALK_AT_RET/ # 3.1 % tma_stlb_miss (33.58%)
10 cpu_atom/MACHINE_CLEARS.SMC/ # 0.0 % tma_smc (33.98%)
66,500,689 cpu_atom/TOPDOWN_BE_BOUND.MEM_SCHEDULER/ # 0.0 % tma_ld_buffer
# 0.0 % tma_rsv
# 11.0 % tma_st_buffer (29.60%)
1,051,312 cpu_atom/LD_HEAD.OTHER_AT_RET/ # 0.9 % tma_other_l1 (30.00%)
14,740,093 cpu_atom/UOPS_RETIRED.MS/ (30.39%)
117,899 cpu_atom/LD_HEAD.DTLB_MISS_AT_RET/ # 0.1 % tma_stlb_hit (30.79%)
701,548 cpu_atom/TOPDOWN_BAD_SPECULATION.NUKE/ # 0.0 % tma_disambiguation
# 0.0 % tma_fp_assist
# 0.1 % tma_memory_ordering
# 0.0 % tma_page_fault (31.08%)
12,873 cpu_atom/MACHINE_CLEARS.MEMORY_ORDERING/ (31.07%)
58,321 cpu_atom/MEM_SCHEDULER_BLOCK.LD_BUF/ (31.07%)
43,458 cpu_atom/MEM_SCHEDULER_BLOCK.RSV/ (31.07%)
14,256,005 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (31.06%)
122,156,534 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 0.0 % tma_store_fwd_blk (36.16%)
0 cpu_atom/MACHINE_CLEARS.FP_ASSIST/ (35.76%)
13,804 cpu_atom/MACHINE_CLEARS.SLOW/ (35.35%)
14,388,300 cpu_atom/MEM_SCHEDULER_BLOCK.ST_BUF/ (34.95%)
493,070,443 cpu_atom/CPU_CLK_UNHALTED.REF_TSC/ (39.73%)
2 cpu_atom/MACHINE_CLEARS.PAGE_FAULT/ (39.33%)
1,101 cpu_atom/LD_HEAD.ST_ADDR_AT_RET/ (38.93%)
929 cpu_atom/MACHINE_CLEARS.DISAMBIGUATION/ (38.55%)
14,241,213 cpu_atom/MEM_SCHEDULER_BLOCK.ALL/ (33.45%)
1,010,981,054 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_assists
# 4.3 % tma_cisc
# 0.0 % tma_fp_scalar
# 0.0 % tma_fp_vector
# 0.0 % tma_shuffles
# 0.0 % tma_int_vector_128b
# 0.0 % tma_x87_use
# 0.0 % tma_int_vector_256b
# 0.7 % tma_clears_resteers
# 12.4 % tma_mispredicts_resteers (8.14%)
132,375,316 cpu_core/topdown-retiring/ (8.14%)
88,303,327 cpu_core/topdown-bad-spec/ (8.14%)
85,519,216 cpu_core/topdown-br-mispredict/ (8.14%)
495,722,455 cpu_core/topdown-fe-bound/ (8.14%)
298,147,134 cpu_core/topdown-be-bound/ (8.14%)
21,418,803 cpu_core/UOPS_EXECUTED.CYCLES_GE_3/ # 8.8 % tma_ports_utilized_3m (10.12%)
35,208,716 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD,cmask=4/ # 14.5 % tma_mem_bandwidth
# 33.3 % tma_mem_latency (10.52%)
17,358 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM/ (10.91%)
55,883,811 cpu_core/RESOURCE_STALLS.SCOREBOARD/ # 24.1 % tma_ports_utilized_0 (12.91%)
0 cpu_core/INT_VEC_RETIRED.ADD_256/ (14.89%)
139,890 cpu_core/DTLB_STORE_MISSES.STLB_HIT,cmask=1/ # 2.8 % tma_dtlb_store (15.30%)
216,886 cpu_core/MEM_INST_RETIRED.LOCK_LOADS/ # 3.8 % tma_store_latency
# 0.1 % tma_lock_latency (15.71%)
115,948,790 cpu_core/UOPS_EXECUTED.THREAD/ (17.69%)
52,155,508 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (15.93%)
6 cpu_core/ASSISTS.ANY,umask=0x1B/ (15.93%)
87,422,517 cpu_core/CYCLE_ACTIVITY.CYCLES_MEM_ANY/ # 5.2 % tma_dtlb_load (15.81%)
37,420,652 cpu_core/MEMORY_ACTIVITY.CYCLES_L1D_MISS/ (15.44%)
43,527,357 cpu_core/UOPS_RETIRED.MS/ (15.04%)
31,787,227 cpu_core/INT_MISC.CLEAR_RESTEER_CYCLES/ (14.64%)
0 cpu_core/FP_ARITH_INST_RETIRED.SCALAR_SINGLE,umask=0x03/ (14.24%)
4,899,130 cpu_core/XQ.FULL_CYCLES/ # 2.0 % tma_sq_full (13.84%)
1,365 cpu_core/OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM/ (13.44%)
23,904,338 cpu_core/EXE_ACTIVITY.1_PORTS_UTIL/ # 9.9 % tma_ports_utilized_1 (13.05%)
251,479 cpu_core/L2_RQSTS.ALL_RFO/ (12.76%)
188,701,010 cpu_core/CYCLE_ACTIVITY.STALLS_TOTAL/ (12.74%)
6,909 cpu_core/MEM_INST_RETIRED.SPLIT_STORES/ # 0.0 % tma_split_stores (12.74%)
619,775 cpu_core/MEM_LOAD_RETIRED.L1_MISS/ (9.56%)
136,716,345 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 0.9 % tma_decoder0_alone (11.15%)
0 cpu_core/INT_VEC_RETIRED.VNNI_128/ (12.74%)
605,850 cpu_core/L1D_PEND_MISS.FB_FULL/ # 0.2 % tma_fb_full (12.73%)
60,079 cpu_core/MEM_STORE_RETIRED.L2_HIT/ (11.14%)
242,508,080 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 4.2 % tma_ports_utilized_2
# 0.2 % tma_store_fwd_blk
# 0.0 % tma_streaming_stores
# 27.5 % tma_unknown_branches
# 0.0 % tma_split_loads (12.74%)
0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE,umask=0x3c/ (14.33%)
32,573 cpu_core/LD_BLOCKS.STORE_FORWARD/ (12.74%)
1,130 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD/ (12.74%)
4,029 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/ (9.56%)
4,844,548 cpu_core/INST_DECODED.DECODERS,cmask=1/ (9.56%)
5,266 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD/ (6.37%)
0 cpu_core/UOPS_EXECUTED.X87/ (7.96%)
0 cpu_core/INT_VEC_RETIRED.MUL_256/ (9.56%)
2,786,473 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ (9.56%)
961,614,001 cpu_core/CPU_CLK_UNHALTED.REF_TSC/ (11.15%)
2,433,107 cpu_core/INST_DECODED.DECODERS,cmask=2/ (11.15%)
0 cpu_core/INT_VEC_RETIRED.ADD_128/ (12.74%)
9,058,046 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO/ (12.74%)
6,399,992 cpu_core/MEM_INST_RETIRED.ALL_STORES/ (12.74%)
45,519,749 cpu_core/L1D_PEND_MISS.PENDING/ (9.56%)
12,200,559 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (7.97%)
115,944,190 cpu_core/OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD/ (6.37%)
0 cpu_core/INT_VEC_RETIRED.VNNI_256/ (7.96%)
1,885,278 cpu_core/INT_MISC.UOP_DROPPING/ (9.56%)
524,819 cpu_core/MEM_LOAD_RETIRED.FB_HIT/ (9.56%)
26,866,872 cpu_core/EXE_ACTIVITY.3_PORTS_UTIL,umask=0x80/ (11.15%)
10,265,977 cpu_core/EXE_ACTIVITY.2_PORTS_UTIL/ (12.74%)
66,662,934 cpu_core/INT_MISC.UNKNOWN_BRANCH_CYCLES/ (12.74%)
0 cpu_core/OCR.STREAMING_WR.ANY_RESPONSE/ (12.74%)
12,499 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/ (12.74%)
0 cpu_core/INT_VEC_RETIRED.SHUFFLES/ (12.74%)
47,649 cpu_core/DTLB_LOAD_MISSES.STLB_HIT,cmask=1/ (12.74%)
106,424 cpu_core/L2_RQSTS.RFO_HIT/ (12.74%)
0 cpu_core/LD_BLOCKS.NO_SR/ (7.97%)
1,343,692 cpu_core/MEM_LOAD_COMPLETED.L1_MISS_ANY/ (7.96%)
28,517 cpu_core/L1D_PEND_MISS.L2_STALLS/ (6.37%)
394,101 cpu_core/MEM_LOAD_RETIRED.L3_HIT/ (6.36%)
76,860,165,929 TSC
1.004834399 seconds time elapsed
$ perf stat --metric-no-threshold --metric-no-group -M TopdownL5 -a sleep 1
Performance counter stats for 'system wide':
839,538,302 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_avx_assists
# 0.0 % tma_fp_assists
# 0.0 % tma_page_faults
# 0.0 % tma_fp_vector_128b
# 0.0 % tma_fp_vector_256b (32.40%)
100,274,045 cpu_core/topdown-retiring/ (32.40%)
77,425,642 cpu_core/topdown-bad-spec/ (32.40%)
424,563,652 cpu_core/topdown-fe-bound/ (32.40%)
245,420,564 cpu_core/topdown-be-bound/ (32.40%)
0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE/ (32.79%)
54,372,921 cpu_core/RESOURCE_STALLS.SCOREBOARD/ # 22.2 % tma_serializing_operation (33.20%)
23,018,585 cpu_core/UOPS_DISPATCHED.PORT_6/ # 8.0 % tma_alu_op_utilization (33.61%)
17,748,101 cpu_core/UOPS_DISPATCHED.PORT_2_3_10/ # 4.2 % tma_load_op_utilization (34.02%)
0 cpu_core/FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE/ (34.43%)
7,616,700 cpu_core/UOPS_DISPATCHED.PORT_0/ (34.83%)
96,571 cpu_core/DTLB_STORE_MISSES.STLB_HIT,cmask=1/ # 0.6 % tma_store_stlb_hit (35.25%)
84,909,672 cpu_core/CYCLE_ACTIVITY.CYCLES_MEM_ANY/ # 0.2 % tma_load_stlb_hit (35.66%)
32,935,744 cpu_core/MEMORY_ACTIVITY.CYCLES_L1D_MISS/ (31.95%)
16,597,385 cpu_core/UOPS_DISPATCHED.PORT_5_11/ (31.95%)
9,452,844 cpu_core/UOPS_DISPATCHED.PORT_1/ (31.94%)
2,620,695 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ # 1.8 % tma_store_stlb_miss (31.95%)
15,699,364 cpu_core/UOPS_DISPATCHED.PORT_7_8/ # 5.7 % tma_store_op_utilization (31.95%)
0 cpu_core/FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE/ (31.94%)
142,096,670 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ (31.95%)
244,591,239 cpu_core/CPU_CLK_UNHALTED.THREAD/ # 5.2 % tma_load_stlb_miss
# 0.0 % tma_mixing_vectors (35.92%)
2,728,385 cpu_core/DTLB_STORE_MISSES.WALK_ACTIVE/ (35.66%)
0 cpu_core/ASSISTS.SSE_AVX_MIX/ (35.27%)
0 cpu_core/FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE/ (34.86%)
12,664,768 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (34.46%)
12,629,733 cpu_core/DTLB_LOAD_MISSES.WALK_ACTIVE/ (34.04%)
0 cpu_core/ASSISTS.FP/ (33.63%)
12 cpu_core/ASSISTS.PAGE_FAULT/ (33.23%)
16,704,699 cpu_core/UOPS_DISPATCHED.PORT_4_9/ (32.81%)
48,386 cpu_core/DTLB_LOAD_MISSES.STLB_HIT,cmask=1/ (28.68%)
1.002806967 seconds time elapsed
$ perf stat --metric-no-threshold --metric-no-group -M TopdownL6 -a sleep 1
Performance counter stats for 'system wide':
743,684 cpu_core/UOPS_DISPATCHED.PORT_0/ # 4.6 % tma_port_0
1,514 cpu_core/MISC2_RETIRED.LFENCE/ # 0.1 % tma_memory_fence
22,120 cpu_core/CPU_CLK_UNHALTED.PAUSE/ # 0.1 % tma_slow_pause
16,187,637 cpu_core/CPU_CLK_UNHALTED.DISTRIBUTED/ # 4.5 % tma_port_1
# 12.6 % tma_port_6
16,754,672 cpu_core/CPU_CLK_UNHALTED.THREAD/
728,805 cpu_core/UOPS_DISPATCHED.PORT_1/
2,040,181 cpu_core/UOPS_DISPATCHED.PORT_6/
1.002727371 seconds time elapse
'''
Using --cputype:
'''
$ perf stat --cputype=core -M TopdownL1 -a sleep 1
Performance counter stats for 'system wide':
90,542,172 cpu_core/TOPDOWN.SLOTS/ # 31.3 % tma_backend_bound
# 7.0 % tma_bad_speculation
# 54.0 % tma_frontend_bound
# 7.6 % tma_retiring
6,917,885 cpu_core/topdown-retiring/
6,242,227 cpu_core/topdown-bad-spec/
2,353,956 cpu_core/topdown-heavy-ops/
49,034,945 cpu_core/topdown-fe-bound/
28,390,484 cpu_core/topdown-be-bound/
98,299 cpu_core/INT_MISC.UOP_DROPPING/
1.002395582 seconds time elapsed
$ perf stat --cputype=atom -M TopdownL1 -a sleep 1
Performance counter stats for 'system wide':
645,836 cpu_atom/TOPDOWN_RETIRING.ALL/ # 26.4 % tma_bad_speculation
2,404,468 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.9 % tma_frontend_bound
1,455,604 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 23.6 % tma_backend_bound
# 23.6 % tma_backend_bound_aux
1,235,109 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 10.4 % tma_retiring
642,124 cpu_atom/TOPDOWN_RETIRING.ALL/
2,398,892 cpu_atom/TOPDOWN_FE_BOUND.ALL/
1,503,157 cpu_atom/TOPDOWN_BE_BOUND.ALL/
1.002061651 seconds time elapsed
'''
Ian Rogers (40):
perf stat: Introduce skippable evsels
perf vendor events intel: Add alderlake metric constraints
perf vendor events intel: Add icelake metric constraints
perf vendor events intel: Add icelakex metric constraints
perf vendor events intel: Add sapphirerapids metric constraints
perf vendor events intel: Add tigerlake metric constraints
perf stat: Avoid segv on counter->name
perf test: Test more sysfs events
perf test: Use valid for PMU tests
perf test: Mask config then test
perf test: Test more with config_cache
perf test: Roundtrip name, don't assume 1 event per name
perf parse-events: Set attr.type to PMU type early
perf print-events: Avoid unnecessary strlist
perf parse-events: Avoid scanning PMUs before parsing
perf test: Validate events with hyphens in
perf evsel: Modify group pmu name for software events
perf test: Move x86 hybrid tests to arch/x86
perf test x86 hybrid: Don't assume evlist order
perf parse-events: Support PMUs for legacy cache events
perf parse-events: Wildcard legacy cache events
perf print-events: Print legacy cache events for each PMU
perf parse-events: Support wildcards on raw events
perf parse-events: Remove now unused hybrid logic
perf parse-events: Minor type safety cleanup
perf parse-events: Add pmu filter
perf stat: Make cputype filter generic
perf test: Add cputype testing to perf stat
perf test: Fix parse-events tests for >1 core PMU
perf parse-events: Support hardware events as terms
perf parse-events: Avoid error when assigning a term
perf parse-events: Avoid error when assigning a legacy cache term
perf parse-events: Don't auto merge hybrid wildcard events
perf parse-events: Don't reorder atom cpu events
perf metrics: Be PMU specific for referenced metrics.
perf metric: Json flag to not group events if gathering a metric group
perf stat: Command line PMU metric filtering
perf vendor events intel: Correct alderlake metrics
perf jevents: Don't rewrite metrics across PMUs
perf metrics: Be PMU specific in event match
tools/perf/arch/x86/include/arch-tests.h | 1 +
tools/perf/arch/x86/tests/Build | 1 +
tools/perf/arch/x86/tests/arch-tests.c | 10 +
tools/perf/arch/x86/tests/hybrid.c | 275 ++++++
tools/perf/arch/x86/util/evlist.c | 4 +-
tools/perf/builtin-list.c | 19 +-
tools/perf/builtin-record.c | 13 +-
tools/perf/builtin-stat.c | 73 +-
tools/perf/builtin-top.c | 5 +-
tools/perf/builtin-trace.c | 5 +-
.../arch/x86/alderlake/adl-metrics.json | 275 +++---
.../arch/x86/alderlaken/adln-metrics.json | 20 +-
.../arch/x86/broadwell/bdw-metrics.json | 12 +
.../arch/x86/broadwellde/bdwde-metrics.json | 12 +
.../arch/x86/broadwellx/bdx-metrics.json | 12 +
.../arch/x86/cascadelakex/clx-metrics.json | 12 +
.../arch/x86/haswell/hsw-metrics.json | 12 +
.../arch/x86/haswellx/hsx-metrics.json | 12 +
.../arch/x86/icelake/icl-metrics.json | 23 +
.../arch/x86/icelakex/icx-metrics.json | 23 +
.../arch/x86/ivybridge/ivb-metrics.json | 12 +
.../arch/x86/ivytown/ivt-metrics.json | 12 +
.../arch/x86/jaketown/jkt-metrics.json | 12 +
.../arch/x86/sandybridge/snb-metrics.json | 12 +
.../arch/x86/sapphirerapids/spr-metrics.json | 23 +
.../arch/x86/skylake/skl-metrics.json | 12 +
.../arch/x86/skylakex/skx-metrics.json | 12 +
.../arch/x86/tigerlake/tgl-metrics.json | 23 +
tools/perf/pmu-events/jevents.py | 10 +-
tools/perf/pmu-events/metric.py | 28 +-
tools/perf/pmu-events/metric_test.py | 6 +-
tools/perf/pmu-events/pmu-events.h | 2 +
tools/perf/tests/evsel-roundtrip-name.c | 119 ++-
tools/perf/tests/parse-events.c | 826 +++++++++---------
tools/perf/tests/pmu-events.c | 12 +-
tools/perf/tests/shell/stat.sh | 44 +
tools/perf/util/Build | 1 -
tools/perf/util/evlist.h | 1 -
tools/perf/util/evsel.c | 30 +-
tools/perf/util/evsel.h | 1 +
tools/perf/util/metricgroup.c | 111 ++-
tools/perf/util/metricgroup.h | 3 +-
tools/perf/util/parse-events-hybrid.c | 214 -----
tools/perf/util/parse-events-hybrid.h | 25 -
tools/perf/util/parse-events.c | 646 ++++++--------
tools/perf/util/parse-events.h | 61 +-
tools/perf/util/parse-events.l | 108 +--
tools/perf/util/parse-events.y | 222 ++---
tools/perf/util/pmu-hybrid.c | 20 -
tools/perf/util/pmu-hybrid.h | 1 -
tools/perf/util/pmu.c | 16 +-
tools/perf/util/pmu.h | 3 +
tools/perf/util/pmus.c | 25 +-
tools/perf/util/pmus.h | 3 +
tools/perf/util/print-events.c | 85 +-
tools/perf/util/stat-display.c | 6 +-
56 files changed, 1939 insertions(+), 1627 deletions(-)
create mode 100644 tools/perf/arch/x86/tests/hybrid.c
delete mode 100644 tools/perf/util/parse-events-hybrid.c
delete mode 100644 tools/perf/util/parse-events-hybrid.h
--
2.40.1.495.gc816e09b53d-goog
Powered by blists - more mailing lists