[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230623003312.3981075-12-irogers@google.com>
Date: Thu, 22 Jun 2023 17:33:11 -0700
From: Ian Rogers <irogers@...gle.com>
To: eter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
Kajol Jain <kjain@...ux.ibm.com>,
John Garry <john.g.garry@...cle.com>,
Andrii Nakryiko <andrii@...nel.org>,
Eduard Zingerman <eddyz87@...il.com>,
Jing Zhang <renyu.zj@...ux.alibaba.com>,
Sohom Datta <sohomdatta1@...il.com>,
linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
Perry Taylor <perry.taylor@...el.com>,
Samantha Alt <samantha.alt@...el.com>,
Caleb Biggers <caleb.biggers@...el.com>,
Weilin Wang <weilin.wang@...el.com>,
Edward Baker <edward.baker@...el.com>
Subject: [PATCH v1 11/12] perf vendor events intel: Update skylakex to 1.31
Updates were released in:
https://github.com/intel/perfmon/commit/cdaa69afe7a48a217b1d89320a27efc6e650cec3
Adds the events IDQ.DSB_CYCLES_OK, IDQ.DSB_CYCLES_ANY,
ICACHE_TAG.STALLS, DECODE.LCP, LSD.CYCLES_OK. Descriptions are also
updated.
Signed-off-by: Ian Rogers <irogers@...gle.com>
---
tools/perf/pmu-events/arch/x86/mapfile.csv | 2 +-
.../arch/x86/skylakex/frontend.json | 43 ++++++++++++++++---
.../arch/x86/skylakex/pipeline.json | 17 ++++++--
.../x86/skylakex/uncore-interconnect.json | 2 +-
.../arch/x86/skylakex/uncore-memory.json | 2 +-
5 files changed, 54 insertions(+), 12 deletions(-)
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index 5104b93d57ab..7c6598a9b240 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -28,7 +28,7 @@ GenuineIntel-6-(8F|CF),v1.14,sapphirerapids,core
GenuineIntel-6-AF,v1.00,sierraforest,core
GenuineIntel-6-(37|4A|4C|4D|5A),v15,silvermont,core
GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v57,skylake,core
-GenuineIntel-6-55-[01234],v1.30,skylakex,core
+GenuineIntel-6-55-[01234],v1.31,skylakex,core
GenuineIntel-6-86,v1.21,snowridgex,core
GenuineIntel-6-8[CD],v1.12,tigerlake,core
GenuineIntel-6-2C,v4,westmereep-dp,core
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/frontend.json b/tools/perf/pmu-events/arch/x86/skylakex/frontend.json
index 04f08e4d2402..095904c77001 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/frontend.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/frontend.json
@@ -7,6 +7,14 @@
"SampleAfterValue": "100003",
"UMask": "0x1"
},
+ {
+ "BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]",
+ "EventCode": "0x87",
+ "EventName": "DECODE.LCP",
+ "PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]",
+ "SampleAfterValue": "2000003",
+ "UMask": "0x1"
+ },
{
"BriefDescription": "Decode Stream Buffer (DSB)-to-MITE switches",
"EventCode": "0xAB",
@@ -245,27 +253,34 @@
"UMask": "0x2"
},
{
- "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
+ "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]",
"EventCode": "0x83",
"EventName": "ICACHE_64B.IFTAG_STALL",
"SampleAfterValue": "200003",
"UMask": "0x4"
},
{
- "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops",
+ "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]",
+ "EventCode": "0x83",
+ "EventName": "ICACHE_TAG.STALLS",
+ "SampleAfterValue": "200003",
+ "UMask": "0x4"
+ },
+ {
+ "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops [This event is alias to IDQ.DSB_CYCLES_OK]",
"CounterMask": "4",
"EventCode": "0x79",
"EventName": "IDQ.ALL_DSB_CYCLES_4_UOPS",
- "PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ.",
+ "PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.DSB_CYCLES_OK]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
{
- "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
+ "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop [This event is alias to IDQ.DSB_CYCLES_ANY]",
"CounterMask": "1",
"EventCode": "0x79",
"EventName": "IDQ.ALL_DSB_CYCLES_ANY_UOPS",
- "PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ.",
+ "PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.DSB_CYCLES_ANY]",
"SampleAfterValue": "2000003",
"UMask": "0x18"
},
@@ -296,6 +311,24 @@
"SampleAfterValue": "2000003",
"UMask": "0x8"
},
+ {
+ "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]",
+ "CounterMask": "1",
+ "EventCode": "0x79",
+ "EventName": "IDQ.DSB_CYCLES_ANY",
+ "PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.ALL_DSB_CYCLES_ANY_UOPS]",
+ "SampleAfterValue": "2000003",
+ "UMask": "0x18"
+ },
+ {
+ "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering 4 Uops [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]",
+ "CounterMask": "4",
+ "EventCode": "0x79",
+ "EventName": "IDQ.DSB_CYCLES_OK",
+ "PublicDescription": "Counts the number of cycles 4 uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path. Count includes uops that may 'bypass' the IDQ. [This event is alias to IDQ.ALL_DSB_CYCLES_4_UOPS]",
+ "SampleAfterValue": "2000003",
+ "UMask": "0x18"
+ },
{
"BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path",
"EventCode": "0x79",
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
index 31a1663d57f8..66d686cc933e 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/pipeline.json
@@ -361,10 +361,10 @@
"UMask": "0x1"
},
{
- "BriefDescription": "Stalls caused by changing prefix length of the instruction.",
+ "BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to DECODE.LCP]",
"EventCode": "0x87",
"EventName": "ILD_STALL.LCP",
- "PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk.",
+ "PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to DECODE.LCP]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
@@ -488,11 +488,11 @@
"UMask": "0x1"
},
{
- "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder.",
+ "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder. [This event is alias to LSD.CYCLES_OK]",
"CounterMask": "4",
"EventCode": "0xA8",
"EventName": "LSD.CYCLES_4_UOPS",
- "PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector).",
+ "PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_OK]",
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
@@ -505,6 +505,15 @@
"SampleAfterValue": "2000003",
"UMask": "0x1"
},
+ {
+ "BriefDescription": "Cycles 4 Uops delivered by the LSD, but didn't come from the decoder. [This event is alias to LSD.CYCLES_4_UOPS]",
+ "CounterMask": "4",
+ "EventCode": "0xA8",
+ "EventName": "LSD.CYCLES_OK",
+ "PublicDescription": "Counts the cycles when 4 uops are delivered by the LSD (Loop-stream detector). [This event is alias to LSD.CYCLES_4_UOPS]",
+ "SampleAfterValue": "2000003",
+ "UMask": "0x1"
+ },
{
"BriefDescription": "Number of Uops delivered by the LSD.",
"EventCode": "0xA8",
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
index 26a5a20bf37a..3eece8a728b5 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-interconnect.json
@@ -6504,7 +6504,7 @@
"EventCode": "0x52",
"EventName": "UNC_M3UPI_RxC_HELD.PARALLEL_SUCCESS",
"PerPkg": "1",
- "PublicDescription": "ad and bl messages were actually slotted into the same flit in paralle",
+ "PublicDescription": "ad and bl messages were actually slotted into the same flit in parallel",
"UMask": "0x8",
"Unit": "M3UPI"
},
diff --git a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
index 6f8ff2262ce7..7a40aa0f1018 100644
--- a/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
+++ b/tools/perf/pmu-events/arch/x86/skylakex/uncore-memory.json
@@ -1952,7 +1952,7 @@
"EventCode": "0x81",
"EventName": "UNC_M_WPQ_OCCUPANCY",
"PerPkg": "1",
- "PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happening in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts. Is there a filter of sorts?",
+ "PublicDescription": "Counts the number of entries in the Write Pending Queue (WPQ) at each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations). The WPQ is used to schedule writes out to the memory controller and to track the requests. Requests allocate into the WPQ soon after they enter the memory controller, and need credits for an entry in this buffer before being sent from the CHA to the iMC (memory controller). They deallocate after being issued to DRAM. Write requests themselves are able to complete (from the perspective of the rest of the system) as soon they have 'posted' to the iMC. This is not to be confused with actually performing the write to DRAM. Therefore, the average latency for this queue is actually not useful for deconstruction intermediate write latencies. So, we provide filtering based on if the request has posted or not. By using the 'not posted' filter, we can track how long writes spent in the iMC before completions were sent to the HA. The 'posted' filter, on the other hand, provides information about how much queueing is actually happening in the iMC for writes before they are actually issued to memory. High average occupancies will generally coincide with high write major mode counts.",
"Unit": "iMC"
},
{
--
2.41.0.162.gfafddb0af9-goog
Powered by blists - more mailing lists