linux-kernel - Re: [PATCH 2/2] perf vendor events intel: Update metrics from TMAM 3.6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9635d0eb-1811-74f0-b9a5-a9bb8959f2bc@intel.com>
Date:   Thu, 31 Oct 2019 09:44:35 +0800
From:   Haiyan Song <haiyanx.song@...el.com>
To:     "Liang, Kan" <kan.liang@...ux.intel.com>, acme@...nel.org,
        jolsa@...nel.org, peterz@...radead.org, mingo@...hat.com,
        alexander.shishkin@...ux.intel.com
Cc:     Linux-kernel@...r.kernel.org, ak@...ux.intel.com,
        kan.liang@...el.com, yao.jin@...el.com, andi.kleen@...el.com
Subject: Re: [PATCH 2/2] perf vendor events intel: Update metrics from TMAM
 3.6

Hi Kan,

Thanks for your review, I've added Signed-off-by in v2 patch.

-- 
Best regards,
Haiyan Song

On 10/30/19 8:31 PM, Liang, Kan wrote:
> 
> 
> On 10/30/2019 4:23 AM, Haiyan Song wrote:
>> Update all the Intel JSON metrics from TMAM 3.6.
>>
>> New Metrics:
>> - DSB_Switches: fraction of cycles CPU was stalled due to switches 
>> from DSB to MITE pipeline [all]
>> - L2_Evictions_{Silent|NonSilent}_PKI: L2 {silent|non silent} 
>> ecivtions rate per Kilo instruction [SKX+]
>> - IpFarBranch - Instructions per Far Branch
>>
>> Other Enhancements & fixes:
>> - KBLR/CFL & CLX move to separate columns (no column sharing via if 
>> #model)
>> - Re-organized/renamed Metric Group
> 
> Signed-off-by is missed here.
> 
> Thanks,
> Kan
> 
>> ---
>>   .../pmu-events/arch/x86/broadwell/bdw-metrics.json | 178 
>> ++++++++---------
>>   .../arch/x86/broadwellx/bdx-metrics.json           | 184 
>> +++++++++---------
>>   .../arch/x86/cascadelakex/clx-metrics.json         | 210 
>> +++++++++++----------
>>   .../pmu-events/arch/x86/haswell/hsw-metrics.json   | 164 
>> ++++++++--------
>>   .../pmu-events/arch/x86/haswellx/hsx-metrics.json  | 170 
>> ++++++++---------
>>   .../pmu-events/arch/x86/ivybridge/ivb-metrics.json | 170 
>> ++++++++---------
>>   .../pmu-events/arch/x86/ivytown/ivt-metrics.json   | 172 
>> ++++++++---------
>>   .../pmu-events/arch/x86/jaketown/jkt-metrics.json  | 114 +++++------
>>   .../arch/x86/sandybridge/snb-metrics.json          | 112 +++++------
>>   .../pmu-events/arch/x86/skylake/skl-metrics.json   | 188 
>> +++++++++---------
>>   .../pmu-events/arch/x86/skylakex/skx-metrics.json  | 204 
>> +++++++++++---------
>>   11 files changed, 954 insertions(+), 912 deletions(-)
>>
>> diff --git a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
>> index 212b117a8ffb..bc7151d639d7 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
>> @@ -1,352 +1,352 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Instruction per taken branch",
>> -        "MetricGroup": "Branches;PGO",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>> +        "MetricGroup": "Branches;Fetch_BW;PGO",
>>           "MetricName": "IpTB"
>>       },
>>       {
>> -        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Branch instructions per taken branch. ",
>> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "MetricGroup": "Branches;PGO",
>>           "MetricName": "BpTB"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4.0 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4.0 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Load (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS",
>> -        "BriefDescription": "Instructions per Load (lower number 
>> means loads are more frequent)",
>> -        "MetricGroup": "Instruction_Type;L1_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpL"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Store (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES",
>> -        "BriefDescription": "Instructions per Store",
>> -        "MetricGroup": "Instruction_Type;Store_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Branch (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / 
>> BR_INST_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Instructions per Branch",
>> -        "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6",
>> +        "MetricGroup": "Branches;Instruction_Type",
>>           "MetricName": "IpB"
>>       },
>>       {
>> +        "BriefDescription": "Instruction per (near) call (lower 
>> number means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
>> -        "BriefDescription": "Instruction per (near) call",
>>           "MetricGroup": "Branches",
>>           "MetricName": "IpCall"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles",
>>           "MetricGroup": "FLOPS",
>>           "MetricName": "FLOPc"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "FLOPS_SMT",
>>           "MetricName": "FLOPc_SMT"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (12 
>> * ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY 
>> ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / 
>> (4 * cycles)) ) * (4 * cycles) / BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "Branch_Misprediction_Cost"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) 
>> + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> * (12 * ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + 
>> BACLEARS.ANY ) / cycles) / (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> ) * (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / 
>> BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts_SMT",
>> +        "MetricGroup": "BrMispredicts_SMT",
>>           "MetricName": "Branch_Misprediction_Cost_SMT"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>>           "BriefDescription": "Number of Instructions per 
>> non-speculative Branch Misprediction (JEClear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "IpMispredict"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "BriefDescription": "Actual Average Latency for L1 
>> data-cache miss demand loads (in core cycles)",
>> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "MetricGroup": "Memory_Bound;Memory_Lat",
>>           "MetricName": "Load_Miss_Real_Latency"
>>       },
>>       {
>> +        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-Logical Processor)",
>>           "MetricExpr": "L1D_PEND_MISS.PENDING / 
>> L1D_PEND_MISS.PENDING_CYCLES",
>> -        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-thread)",
>>           "MetricGroup": "Memory_Bound;Memory_BW",
>>           "MetricName": "MLP"
>>       },
>>       {
>> -        "MetricExpr": "( cpu@...B_MISSES.WALK_DURATION\\,cmask\\=1@ + 
>> cpu@...B_LOAD_MISSES.WALK_DURATION\\,cmask\\=1@ + 
>> cpu@...B_STORE_MISSES.WALK_DURATION\\,cmask\\=1@ + 7 * ( 
>> DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + 
>> ITLB_MISSES.WALK_COMPLETED ) ) / cycles",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( cpu@...B_MISSES.WALK_DURATION\\,cmask\\=1@ + 
>> cpu@...B_LOAD_MISSES.WALK_DURATION\\,cmask\\=1@ + 
>> cpu@...B_STORE_MISSES.WALK_DURATION\\,cmask\\=1@ + 7 * ( 
>> DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + 
>> ITLB_MISSES.WALK_COMPLETED ) ) / cycles",
>>           "MetricGroup": "TLB",
>>           "MetricName": "Page_Walks_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( cpu@...B_MISSES.WALK_DURATION\\,cmask\\=1@ + 
>> cpu@...B_LOAD_MISSES.WALK_DURATION\\,cmask\\=1@ + 
>> cpu@...B_STORE_MISSES.WALK_DURATION\\,cmask\\=1@ + 7 * ( 
>> DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + 
>> ITLB_MISSES.WALK_COMPLETED ) ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * 
>> ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) 
>> ))",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( cpu@...B_MISSES.WALK_DURATION\\,cmask\\=1@ + 
>> cpu@...B_LOAD_MISSES.WALK_DURATION\\,cmask\\=1@ + 
>> cpu@...B_STORE_MISSES.WALK_DURATION\\,cmask\\=1@ + 7 * ( 
>> DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + 
>> ITLB_MISSES.WALK_COMPLETED ) ) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * 
>> ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) 
>> ))",
>>           "MetricGroup": "TLB_SMT",
>>           "MetricName": "Page_Walks_Utilization_SMT"
>>       },
>>       {
>> -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L1 
>> data cache [GB / sec]",
>> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L1D_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L2 
>> cache [GB / sec]",
>> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L2_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L1 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L1MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache misses per kilo instruction 
>> for all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache hits per kilo instruction for 
>> all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2HPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L3 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L3MPKI"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "BriefDescription": "Giga Floating Point Operations Per 
>> Second",
>> +        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "MetricGroup": "FLOPS;Summary",
>>           "MetricName": "GFLOPs"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git 
>> a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>> index c6f9762f32c0..113d19e92678 100644
>> --- a/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
>> @@ -1,370 +1,370 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Instruction per taken branch",
>> -        "MetricGroup": "Branches;PGO",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>> +        "MetricGroup": "Branches;Fetch_BW;PGO",
>>           "MetricName": "IpTB"
>>       },
>>       {
>> -        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Branch instructions per taken branch. ",
>> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "MetricGroup": "Branches;PGO",
>>           "MetricName": "BpTB"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4.0 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4.0 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Load (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS",
>> -        "BriefDescription": "Instructions per Load (lower number 
>> means loads are more frequent)",
>> -        "MetricGroup": "Instruction_Type;L1_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpL"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Store (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES",
>> -        "BriefDescription": "Instructions per Store",
>> -        "MetricGroup": "Instruction_Type;Store_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Branch (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / 
>> BR_INST_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Instructions per Branch",
>> -        "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6",
>> +        "MetricGroup": "Branches;Instruction_Type",
>>           "MetricName": "IpB"
>>       },
>>       {
>> +        "BriefDescription": "Instruction per (near) call (lower 
>> number means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
>> -        "BriefDescription": "Instruction per (near) call",
>>           "MetricGroup": "Branches",
>>           "MetricName": "IpCall"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles",
>>           "MetricGroup": "FLOPS",
>>           "MetricName": "FLOPc"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "FLOPS_SMT",
>>           "MetricName": "FLOPc_SMT"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (12 
>> * ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + BACLEARS.ANY 
>> ) / cycles) / (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / 
>> (4 * cycles)) ) * (4 * cycles) / BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "Branch_Misprediction_Cost"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) 
>> + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> * (12 * ( BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT + 
>> BACLEARS.ANY ) / cycles) / (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> ) * (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / 
>> BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts_SMT",
>> +        "MetricGroup": "BrMispredicts_SMT",
>>           "MetricName": "Branch_Misprediction_Cost_SMT"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>>           "BriefDescription": "Number of Instructions per 
>> non-speculative Branch Misprediction (JEClear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "IpMispredict"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "BriefDescription": "Actual Average Latency for L1 
>> data-cache miss demand loads (in core cycles)",
>> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "MetricGroup": "Memory_Bound;Memory_Lat",
>>           "MetricName": "Load_Miss_Real_Latency"
>>       },
>>       {
>> +        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-Logical Processor)",
>>           "MetricExpr": "L1D_PEND_MISS.PENDING / 
>> L1D_PEND_MISS.PENDING_CYCLES",
>> -        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-thread)",
>>           "MetricGroup": "Memory_Bound;Memory_BW",
>>           "MetricName": "MLP"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION + 7 * 
>> ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + 
>> ITLB_MISSES.WALK_COMPLETED ) ) / ( 2 * cycles )",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION + 7 * 
>> ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + 
>> ITLB_MISSES.WALK_COMPLETED ) ) / ( 2 * cycles )",
>>           "MetricGroup": "TLB",
>>           "MetricName": "Page_Walks_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION + 7 * 
>> ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + 
>> ITLB_MISSES.WALK_COMPLETED ) ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 
>> 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / 
>> CPU_CLK_UNHALTED.REF_XCLK ) )) )",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION + 7 * 
>> ( DTLB_STORE_MISSES.WALK_COMPLETED + DTLB_LOAD_MISSES.WALK_COMPLETED + 
>> ITLB_MISSES.WALK_COMPLETED ) ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 
>> 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / 
>> CPU_CLK_UNHALTED.REF_XCLK ) )) )",
>>           "MetricGroup": "TLB_SMT",
>>           "MetricName": "Page_Walks_Utilization_SMT"
>>       },
>>       {
>> -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L1 
>> data cache [GB / sec]",
>> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L1D_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L2 
>> cache [GB / sec]",
>> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L2_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L1 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L1MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache misses per kilo instruction 
>> for all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache hits per kilo instruction for 
>> all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2HPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L3 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L3MPKI"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "BriefDescription": "Giga Floating Point Operations Per 
>> Second",
>> +        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "MetricGroup": "FLOPS;Summary",
>>           "MetricName": "GFLOPs"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> -        "MetricExpr": "1000000000 * ( 
>> cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / 
>> cbox@...nt\\=0x35\\,umask\\=0x3\\,filter_opc\\=0x182@ ) / ( 
>> cbox_0@...nt\\=0x0@ / duration_time )",
>>           "BriefDescription": "Average latency of data read request to 
>> external memory (in nanoseconds). Accounts for demand loads and L1/L2 
>> prefetches",
>> +        "MetricExpr": "1000000000 * ( 
>> cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / 
>> cbox@...nt\\=0x35\\,umask\\=0x3\\,filter_opc\\=0x182@ ) / ( 
>> cbox_0@...nt\\=0x0@ / duration_time )",
>>           "MetricGroup": "Memory_Lat",
>>           "MetricName": "DRAM_Read_Latency"
>>       },
>>       {
>> -        "MetricExpr": 
>> "cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / 
>> cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182\\,thresh\\=1@",
>>           "BriefDescription": "Average number of parallel data read 
>> requests to external memory. Accounts for demand loads and L1/L2 
>> prefetches",
>> +        "MetricExpr": 
>> "cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / 
>> cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182\\,thresh\\=1@",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_Parallel_Reads"
>>       },
>>       {
>> -        "MetricExpr": "cbox_0@...nt\\=0x0@",
>>           "BriefDescription": "Socket actual clocks when any core is 
>> active on that socket",
>> +        "MetricExpr": "cbox_0@...nt\\=0x0@",
>>           "MetricGroup": "",
>>           "MetricName": "Socket_CLKS"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git 
>> a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
>> index a382b115633d..2ba32af9bc36 100644
>> --- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
>> @@ -1,394 +1,412 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Instruction per taken branch",
>> -        "MetricGroup": "Branches;PGO",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>> +        "MetricGroup": "Branches;Fetch_BW;PGO",
>>           "MetricName": "IpTB"
>>       },
>>       {
>> -        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Branch instructions per taken branch. ",
>> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "MetricGroup": "Branches;PGO",
>>           "MetricName": "BpTB"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( 
>> ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( 
>> ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ))",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS)",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Load (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS",
>> -        "BriefDescription": "Instructions per Load (lower number 
>> means loads are more frequent)",
>> -        "MetricGroup": "Instruction_Type;L1_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpL"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Store (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES",
>> -        "BriefDescription": "Instructions per Store",
>> -        "MetricGroup": "Instruction_Type;Store_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Branch (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / 
>> BR_INST_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Instructions per Branch",
>> -        "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6",
>> +        "MetricGroup": "Branches;Instruction_Type",
>>           "MetricName": "IpB"
>>       },
>>       {
>> +        "BriefDescription": "Instruction per (near) call (lower 
>> number means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
>> -        "BriefDescription": "Instruction per (near) call",
>>           "MetricGroup": "Branches",
>>           "MetricName": "IpCall"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / cycles",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / cycles",
>>           "MetricGroup": "FLOPS",
>>           "MetricName": "FLOPc"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "FLOPS_SMT",
>>           "MetricName": "FLOPc_SMT"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (( 
>> INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) ) * (4 
>> * cycles) / BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "Branch_Misprediction_Cost"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) 
>> + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> * (( INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 
>> * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> ) * (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / 
>> BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts_SMT",
>> +        "MetricGroup": "BrMispredicts_SMT",
>>           "MetricName": "Branch_Misprediction_Cost_SMT"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>>           "BriefDescription": "Number of Instructions per 
>> non-speculative Branch Misprediction (JEClear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "IpMispredict"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
>>           "BriefDescription": "Actual Average Latency for L1 
>> data-cache miss demand loads (in core cycles)",
>> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
>>           "MetricGroup": "Memory_Bound;Memory_Lat",
>>           "MetricName": "Load_Miss_Real_Latency"
>>       },
>>       {
>> +        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-Logical Processor)",
>>           "MetricExpr": "L1D_PEND_MISS.PENDING / 
>> L1D_PEND_MISS.PENDING_CYCLES",
>> -        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-thread)",
>>           "MetricGroup": "Memory_Bound;Memory_BW",
>>           "MetricName": "MLP"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * cycles )",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * cycles )",
>>           "MetricGroup": "TLB",
>>           "MetricName": "Page_Walks_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )",
>>           "MetricGroup": "TLB_SMT",
>>           "MetricName": "Page_Walks_Utilization_SMT"
>>       },
>>       {
>> -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L1 
>> data cache [GB / sec]",
>> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L1D_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L2 
>> cache [GB / sec]",
>> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L2_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 
>> 1000000000 / duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 
>> 1000000000 / duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Access_BW"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L1 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L1MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache misses per kilo instruction 
>> for all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache hits per kilo instruction for 
>> all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2HPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L3 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L3MPKI"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>> +        "BriefDescription": "Rate of silent evictions from the L2 
>> cache per Kilo instruction where the evicted lines are dropped (no 
>> writeback to L3 or memory)",
>> +        "MetricExpr": "1000 * L2_LINES_OUT.SILENT / INST_RETIRED.ANY",
>> +        "MetricGroup": "",
>> +        "MetricName": "L2_Evictions_Silent_PKI"
>> +    },
>> +    {
>> +        "BriefDescription": "Rate of non silent evictions from the L2 
>> cache per Kilo instruction",
>> +        "MetricExpr": "1000 * L2_LINES_OUT.NON_SILENT / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "",
>> +        "MetricName": "L2_Evictions_NonSilent_PKI"
>> +    },
>> +    {
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "BriefDescription": "Giga Floating Point Operations Per 
>> Second",
>> +        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "MetricGroup": "FLOPS;Summary",
>>           "MetricName": "GFLOPs"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> -    "MetricExpr": "1000000000 * ( 
>> cha@...nt\\=0x36\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ / 
>> cha@...nt\\=0x35\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ ) / ( 
>> cha_0@...nt\\=0x0@ / duration_time )",
>>           "BriefDescription": "Average latency of data read request to 
>> external memory (in nanoseconds). Accounts for demand loads and L1/L2 
>> prefetches",
>> +        "MetricExpr": "1000000000 * ( 
>> cha@...nt\\=0x36\\\\\\,umask\\=0x21@ / 
>> cha@...nt\\=0x35\\\\\\,umask\\=0x21@ ) / ( cha_0@...nt\\=0x0@ / 
>> duration_time )",
>>           "MetricGroup": "Memory_Lat",
>>           "MetricName": "DRAM_Read_Latency"
>>       },
>>       {
>> -    "MetricExpr": 
>> "cha@...nt\\=0x36\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ / 
>> cha@...nt\\=0x36\\\\\\,umask\\=0x21\\\\\\,thresh\\=1\\\\\\,config\\=0x40433@", 
>>
>>           "BriefDescription": "Average number of parallel data read 
>> requests to external memory. Accounts for demand loads and L1/L2 
>> prefetches",
>> +        "MetricExpr": "cha@...nt\\=0x36\\\\\\,umask\\=0x21@ / 
>> cha@...nt\\=0x36\\\\\\,umask\\=0x21\\\\\\,thresh\\=1@",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_Parallel_Reads"
>>       },
>>       {
>> -        "MetricExpr": "( 1000000000 * ( 
>> imc@...nt\\=0xe0\\\\\\,umask\\=0x1@ / imc@...nt\\=0xe3@ ) / 
>> imc_0@...nt\\=0x0@ ) if 1 if 1 == 1 else 0 else 0",
>>           "BriefDescription": "Average latency of data read request to 
>> external 3D X-Point memory [in nanoseconds]. Accounts for demand loads 
>> and L1/L2 data-read prefetches",
>> +        "MetricExpr": "( 1000000000 * ( 
>> imc@...nt\\=0xe0\\\\\\,umask\\=0x1@ / imc@...nt\\=0xe3@ ) / 
>> imc_0@...nt\\=0x0@ ) if 1 if 0 == 1 else 0 else 0",
>>           "MetricGroup": "Memory_Lat",
>>           "MetricName": "MEM_PMM_Read_Latency"
>>       },
>>       {
>> -        "MetricExpr": "( ( 64 * imc@...nt\\=0xe3@ / 1000000000 ) / 
>> duration_time ) if 1 if 1 == 1 else 0 else 0",
>>           "BriefDescription": "Average 3DXP Memory Bandwidth Use for 
>> reads [GB / sec]",
>> +        "MetricExpr": "( ( 64 * imc@...nt\\=0xe3@ / 1000000000 ) / 
>> duration_time ) if 1 if 0 == 1 else 0 else 0",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "PMM_Read_BW"
>>       },
>>       {
>> -        "MetricExpr": "( ( 64 * imc@...nt\\=0xe7@ / 1000000000 ) / 
>> duration_time ) if 1 if 1 == 1 else 0 else 0",
>>           "BriefDescription": "Average 3DXP Memory Bandwidth Use for 
>> Writes [GB / sec]",
>> +        "MetricExpr": "( ( 64 * imc@...nt\\=0xe7@ / 1000000000 ) / 
>> duration_time ) if 1 if 0 == 1 else 0 else 0",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "PMM_Write_BW"
>>       },
>>       {
>> -        "MetricExpr": "cha_0@...nt\\=0x0@",
>>           "BriefDescription": "Socket actual clocks when any core is 
>> active on that socket",
>> +        "MetricExpr": "cha_0@...nt\\=0x0@",
>>           "MetricGroup": "",
>>           "MetricName": "Socket_CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Far Branch ( Far 
>> Branches apply upon transition from application to operating system, 
>> handling interrupts, exceptions. )",
>> +        "MetricExpr": "INST_RETIRED.ANY / ( 
>> BR_INST_RETIRED.FAR_BRANCH / 2 )",
>> +        "MetricGroup": "",
>> +        "MetricName": "IpFarBranch"
>> +    },
>> +    {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
>> index 21b27488b621..c80f16fde6d0 100644
>> --- a/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
>> @@ -1,322 +1,322 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Instruction per taken branch",
>> -        "MetricGroup": "Branches;PGO",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>> +        "MetricGroup": "Branches;Fetch_BW;PGO",
>>           "MetricName": "IpTB"
>>       },
>>       {
>> -        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Branch instructions per taken branch. ",
>> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "MetricGroup": "Branches;PGO",
>>           "MetricName": "BpTB"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4.0 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4.0 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Load (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS",
>> -        "BriefDescription": "Instructions per Load (lower number 
>> means loads are more frequent)",
>> -        "MetricGroup": "Instruction_Type;L1_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpL"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Store (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES",
>> -        "BriefDescription": "Instructions per Store",
>> -        "MetricGroup": "Instruction_Type;Store_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Branch (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / 
>> BR_INST_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Instructions per Branch",
>> -        "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6",
>> +        "MetricGroup": "Branches;Instruction_Type",
>>           "MetricName": "IpB"
>>       },
>>       {
>> +        "BriefDescription": "Instruction per (near) call (lower 
>> number means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
>> -        "BriefDescription": "Instruction per (near) call",
>>           "MetricGroup": "Branches",
>>           "MetricName": "IpCall"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_EXECUTED.CORE / 2 / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@) ) if #SMT_on else 
>> UOPS_EXECUTED.CORE / (( cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if 
>> #SMT_on else cpu@...S_EXECUTED.CORE\\,cmask\\=1@)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "( UOPS_EXECUTED.CORE / 2 / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@) ) if #SMT_on else 
>> UOPS_EXECUTED.CORE / (( cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if 
>> #SMT_on else cpu@...S_EXECUTED.CORE\\,cmask\\=1@)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>>           "BriefDescription": "Number of Instructions per 
>> non-speculative Branch Misprediction (JEClear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "IpMispredict"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "BriefDescription": "Actual Average Latency for L1 
>> data-cache miss demand loads (in core cycles)",
>> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "MetricGroup": "Memory_Bound;Memory_Lat",
>>           "MetricName": "Load_Miss_Real_Latency"
>>       },
>>       {
>> +        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-Logical Processor)",
>>           "MetricExpr": "L1D_PEND_MISS.PENDING / 
>> L1D_PEND_MISS.PENDING_CYCLES",
>> -        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-thread)",
>>           "MetricGroup": "Memory_Bound;Memory_BW",
>>           "MetricName": "MLP"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> cycles",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> cycles",
>>           "MetricGroup": "TLB",
>>           "MetricName": "Page_Walks_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "TLB_SMT",
>>           "MetricName": "Page_Walks_Utilization_SMT"
>>       },
>>       {
>> -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L1 
>> data cache [GB / sec]",
>> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L1D_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L2 
>> cache [GB / sec]",
>> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L2_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L1 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L1MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache misses per kilo instruction 
>> for all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache hits per kilo instruction for 
>> all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2HPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L3 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L3MPKI"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json
>> index e5aac148c941..e501729c3dd1 100644
>> --- a/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json
>> @@ -1,340 +1,340 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Instruction per taken branch",
>> -        "MetricGroup": "Branches;PGO",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>> +        "MetricGroup": "Branches;Fetch_BW;PGO",
>>           "MetricName": "IpTB"
>>       },
>>       {
>> -        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Branch instructions per taken branch. ",
>> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "MetricGroup": "Branches;PGO",
>>           "MetricName": "BpTB"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4.0 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , IDQ.MITE_UOPS / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 16 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4.0 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Load (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS",
>> -        "BriefDescription": "Instructions per Load (lower number 
>> means loads are more frequent)",
>> -        "MetricGroup": "Instruction_Type;L1_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpL"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Store (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES",
>> -        "BriefDescription": "Instructions per Store",
>> -        "MetricGroup": "Instruction_Type;Store_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Branch (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / 
>> BR_INST_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Instructions per Branch",
>> -        "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6",
>> +        "MetricGroup": "Branches;Instruction_Type",
>>           "MetricName": "IpB"
>>       },
>>       {
>> +        "BriefDescription": "Instruction per (near) call (lower 
>> number means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
>> -        "BriefDescription": "Instruction per (near) call",
>>           "MetricGroup": "Branches",
>>           "MetricName": "IpCall"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_EXECUTED.CORE / 2 / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@) ) if #SMT_on else 
>> UOPS_EXECUTED.CORE / (( cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if 
>> #SMT_on else cpu@...S_EXECUTED.CORE\\,cmask\\=1@)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "( UOPS_EXECUTED.CORE / 2 / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@) ) if #SMT_on else 
>> UOPS_EXECUTED.CORE / (( cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if 
>> #SMT_on else cpu@...S_EXECUTED.CORE\\,cmask\\=1@)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>>           "BriefDescription": "Number of Instructions per 
>> non-speculative Branch Misprediction (JEClear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "IpMispredict"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "BriefDescription": "Actual Average Latency for L1 
>> data-cache miss demand loads (in core cycles)",
>> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "MetricGroup": "Memory_Bound;Memory_Lat",
>>           "MetricName": "Load_Miss_Real_Latency"
>>       },
>>       {
>> +        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-Logical Processor)",
>>           "MetricExpr": "L1D_PEND_MISS.PENDING / 
>> L1D_PEND_MISS.PENDING_CYCLES",
>> -        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-thread)",
>>           "MetricGroup": "Memory_Bound;Memory_BW",
>>           "MetricName": "MLP"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> cycles",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> cycles",
>>           "MetricGroup": "TLB",
>>           "MetricName": "Page_Walks_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "TLB_SMT",
>>           "MetricName": "Page_Walks_Utilization_SMT"
>>       },
>>       {
>> -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L1 
>> data cache [GB / sec]",
>> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L1D_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L2 
>> cache [GB / sec]",
>> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L2_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L1 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L1MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache misses per kilo instruction 
>> for all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache hits per kilo instruction for 
>> all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2HPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L3 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L3MPKI"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> -        "MetricExpr": "1000000000 * ( 
>> cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / 
>> cbox@...nt\\=0x35\\,umask\\=0x3\\,filter_opc\\=0x182@ ) / ( 
>> cbox_0@...nt\\=0x0@ / duration_time )",
>>           "BriefDescription": "Average latency of data read request to 
>> external memory (in nanoseconds). Accounts for demand loads and L1/L2 
>> prefetches",
>> +        "MetricExpr": "1000000000 * ( 
>> cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / 
>> cbox@...nt\\=0x35\\,umask\\=0x3\\,filter_opc\\=0x182@ ) / ( 
>> cbox_0@...nt\\=0x0@ / duration_time )",
>>           "MetricGroup": "Memory_Lat",
>>           "MetricName": "DRAM_Read_Latency"
>>       },
>>       {
>> -        "MetricExpr": 
>> "cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / 
>> cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182\\,thresh\\=1@",
>>           "BriefDescription": "Average number of parallel data read 
>> requests to external memory. Accounts for demand loads and L1/L2 
>> prefetches",
>> +        "MetricExpr": 
>> "cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182@ / 
>> cbox@...nt\\=0x36\\,umask\\=0x3\\,filter_opc\\=0x182\\,thresh\\=1@",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_Parallel_Reads"
>>       },
>>       {
>> -        "MetricExpr": "cbox_0@...nt\\=0x0@",
>>           "BriefDescription": "Socket actual clocks when any core is 
>> active on that socket",
>> +        "MetricExpr": "cbox_0@...nt\\=0x0@",
>>           "MetricGroup": "",
>>           "MetricName": "Socket_CLKS"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
>> index bc4d5fc284a0..e2446966b651 100644
>> --- a/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
>> @@ -1,340 +1,340 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Instruction per taken branch",
>> -        "MetricGroup": "Branches;PGO",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>> +        "MetricGroup": "Branches;Fetch_BW;PGO",
>>           "MetricName": "IpTB"
>>       },
>>       {
>> -        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Branch instructions per taken branch. ",
>> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "MetricGroup": "Branches;PGO",
>>           "MetricName": "BpTB"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Load (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS",
>> -        "BriefDescription": "Instructions per Load (lower number 
>> means loads are more frequent)",
>> -        "MetricGroup": "Instruction_Type;L1_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpL"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Store (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES",
>> -        "BriefDescription": "Instructions per Store",
>> -        "MetricGroup": "Instruction_Type;Store_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Branch (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / 
>> BR_INST_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Instructions per Branch",
>> -        "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6",
>> +        "MetricGroup": "Branches;Instruction_Type",
>>           "MetricName": "IpB"
>>       },
>>       {
>> +        "BriefDescription": "Instruction per (near) call (lower 
>> number means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
>> -        "BriefDescription": "Instruction per (near) call",
>>           "MetricGroup": "Branches",
>>           "MetricName": "IpCall"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / cycles",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / cycles",
>>           "MetricGroup": "FLOPS",
>>           "MetricName": "FLOPc"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 
>> 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 
>> 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "FLOPS_SMT",
>>           "MetricName": "FLOPc_SMT"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>>           "BriefDescription": "Number of Instructions per 
>> non-speculative Branch Misprediction (JEClear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "IpMispredict"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "BriefDescription": "Actual Average Latency for L1 
>> data-cache miss demand loads (in core cycles)",
>> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "MetricGroup": "Memory_Bound;Memory_Lat",
>>           "MetricName": "Load_Miss_Real_Latency"
>>       },
>>       {
>> +        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-Logical Processor)",
>>           "MetricExpr": "L1D_PEND_MISS.PENDING / 
>> L1D_PEND_MISS.PENDING_CYCLES",
>> -        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-thread)",
>>           "MetricGroup": "Memory_Bound;Memory_BW",
>>           "MetricName": "MLP"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> cycles",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> cycles",
>>           "MetricGroup": "TLB",
>>           "MetricName": "Page_Walks_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "TLB_SMT",
>>           "MetricName": "Page_Walks_Utilization_SMT"
>>       },
>>       {
>> -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L1 
>> data cache [GB / sec]",
>> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L1D_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L2 
>> cache [GB / sec]",
>> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L2_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L1 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L1MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache misses per kilo instruction 
>> for all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache hits per kilo instruction for 
>> all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2HPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.LLC_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L3 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.LLC_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L3MPKI"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Giga Floating Point Operations Per 
>> Second",
>> +        "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time",
>>           "MetricGroup": "FLOPS;Summary",
>>           "MetricName": "GFLOPs"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json
>> index f3874b5f9995..9294769dec64 100644
>> --- a/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json
>> @@ -1,346 +1,346 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Instruction per taken branch",
>> -        "MetricGroup": "Branches;PGO",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>> +        "MetricGroup": "Branches;Fetch_BW;PGO",
>>           "MetricName": "IpTB"
>>       },
>>       {
>> -        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Branch instructions per taken branch. ",
>> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "MetricGroup": "Branches;PGO",
>>           "MetricName": "BpTB"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Load (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS",
>> -        "BriefDescription": "Instructions per Load (lower number 
>> means loads are more frequent)",
>> -        "MetricGroup": "Instruction_Type;L1_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpL"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Store (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES",
>> -        "BriefDescription": "Instructions per Store",
>> -        "MetricGroup": "Instruction_Type;Store_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Branch (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / 
>> BR_INST_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Instructions per Branch",
>> -        "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6",
>> +        "MetricGroup": "Branches;Instruction_Type",
>>           "MetricName": "IpB"
>>       },
>>       {
>> +        "BriefDescription": "Instruction per (near) call (lower 
>> number means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
>> -        "BriefDescription": "Instruction per (near) call",
>>           "MetricGroup": "Branches",
>>           "MetricName": "IpCall"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / cycles",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / cycles",
>>           "MetricGroup": "FLOPS",
>>           "MetricName": "FLOPc"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 
>> 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 
>> 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "FLOPS_SMT",
>>           "MetricName": "FLOPc_SMT"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> cpu@...S_EXECUTED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CYCLES_GE_1_UOP_EXEC)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>>           "BriefDescription": "Number of Instructions per 
>> non-speculative Branch Misprediction (JEClear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "IpMispredict"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "BriefDescription": "Actual Average Latency for L1 
>> data-cache miss demand loads (in core cycles)",
>> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_UOPS_RETIRED.L1_MISS + mem_load_uops_retired.hit_lfb )",
>>           "MetricGroup": "Memory_Bound;Memory_Lat",
>>           "MetricName": "Load_Miss_Real_Latency"
>>       },
>>       {
>> +        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-Logical Processor)",
>>           "MetricExpr": "L1D_PEND_MISS.PENDING / 
>> L1D_PEND_MISS.PENDING_CYCLES",
>> -        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-thread)",
>>           "MetricGroup": "Memory_Bound;Memory_BW",
>>           "MetricName": "MLP"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> cycles",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> cycles",
>>           "MetricGroup": "TLB",
>>           "MetricName": "Page_Walks_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_DURATION + 
>> DTLB_LOAD_MISSES.WALK_DURATION + DTLB_STORE_MISSES.WALK_DURATION ) / 
>> (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "TLB_SMT",
>>           "MetricName": "Page_Walks_Utilization_SMT"
>>       },
>>       {
>> -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L1 
>> data cache [GB / sec]",
>> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L1D_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L2 
>> cache [GB / sec]",
>> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L2_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L1 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L1MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache misses per kilo instruction 
>> for all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache hits per kilo instruction for 
>> all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2HPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.LLC_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L3 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_UOPS_RETIRED.LLC_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L3MPKI"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Giga Floating Point Operations Per 
>> Second",
>> +        "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time",
>>           "MetricGroup": "FLOPS;Summary",
>>           "MetricName": "GFLOPs"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> -        "MetricExpr": "cbox_0@...nt\\=0x0@",
>>           "BriefDescription": "Socket actual clocks when any core is 
>> active on that socket",
>> +        "MetricExpr": "cbox_0@...nt\\=0x0@",
>>           "MetricGroup": "",
>>           "MetricName": "Socket_CLKS"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json
>> index 98c73e430b05..603ff9c2e9a1 100644
>> --- a/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json
>> @@ -1,232 +1,232 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / cycles",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / cycles",
>>           "MetricGroup": "FLOPS",
>>           "MetricName": "FLOPc"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 
>> 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 
>> 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "FLOPS_SMT",
>>           "MetricName": "FLOPc_SMT"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_DISPATCHED.THREAD / (( 
>> cpu@...S_DISPATCHED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> cpu@...S_DISPATCHED.CORE\\,cmask\\=1@)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "UOPS_DISPATCHED.THREAD / (( 
>> cpu@...S_DISPATCHED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> cpu@...S_DISPATCHED.CORE\\,cmask\\=1@)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Giga Floating Point Operations Per 
>> Second",
>> +        "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time",
>>           "MetricGroup": "FLOPS;Summary",
>>           "MetricName": "GFLOPs"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> -        "MetricExpr": "cbox_0@...nt\\=0x0@",
>>           "BriefDescription": "Socket actual clocks when any core is 
>> active on that socket",
>> +        "MetricExpr": "cbox_0@...nt\\=0x0@",
>>           "MetricGroup": "",
>>           "MetricName": "Socket_CLKS"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git 
>> a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
>> index cfeba5067bab..c6b485b3a2cb 100644
>> --- a/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
>> @@ -1,226 +1,226 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 32 * ( ICACHE.HIT + 
>> ICACHE.MISSES ) / 4 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + LSD.UOPS + 
>> IDQ.MITE_UOPS + IDQ.MS_UOPS ) )",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / cycles",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / cycles",
>>           "MetricGroup": "FLOPS",
>>           "MetricName": "FLOPc"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 
>> 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 
>> 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "FLOPS_SMT",
>>           "MetricName": "FLOPc_SMT"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_DISPATCHED.THREAD / (( 
>> cpu@...S_DISPATCHED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> cpu@...S_DISPATCHED.CORE\\,cmask\\=1@)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "UOPS_DISPATCHED.THREAD / (( 
>> cpu@...S_DISPATCHED.CORE\\,cmask\\=1@ / 2 ) if #SMT_on else 
>> cpu@...S_DISPATCHED.CORE\\,cmask\\=1@)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Giga Floating Point Operations Per 
>> Second",
>> +        "MetricExpr": "( (( 1 * ( FP_COMP_OPS_EXE.SSE_SCALAR_SINGLE + 
>> FP_COMP_OPS_EXE.SSE_SCALAR_DOUBLE ) + 2 * 
>> FP_COMP_OPS_EXE.SSE_PACKED_DOUBLE + 4 * ( 
>> FP_COMP_OPS_EXE.SSE_PACKED_SINGLE + SIMD_FP_256.PACKED_DOUBLE ) + 8 * 
>> SIMD_FP_256.PACKED_SINGLE )) / 1000000000 ) / duration_time",
>>           "MetricGroup": "FLOPS;Summary",
>>           "MetricName": "GFLOPs"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
>> index 2c95417a4dae..0ca539bb60f6 100644
>> --- a/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
>> @@ -1,364 +1,370 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Instruction per taken branch",
>> -        "MetricGroup": "Branches;PGO",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>> +        "MetricGroup": "Branches;Fetch_BW;PGO",
>>           "MetricName": "IpTB"
>>       },
>>       {
>> -        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Branch instructions per taken branch. ",
>> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "MetricGroup": "Branches;PGO",
>>           "MetricName": "BpTB"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( 
>> ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( 
>> ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + IDQ.MITE_UOPS 
>> + IDQ.MS_UOPS ))",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + IDQ.MITE_UOPS + 
>> IDQ.MS_UOPS)",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Load (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS",
>> -        "BriefDescription": "Instructions per Load (lower number 
>> means loads are more frequent)",
>> -        "MetricGroup": "Instruction_Type;L1_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpL"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Store (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES",
>> -        "BriefDescription": "Instructions per Store",
>> -        "MetricGroup": "Instruction_Type;Store_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Branch (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / 
>> BR_INST_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Instructions per Branch",
>> -        "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6",
>> +        "MetricGroup": "Branches;Instruction_Type",
>>           "MetricName": "IpB"
>>       },
>>       {
>> +        "BriefDescription": "Instruction per (near) call (lower 
>> number means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
>> -        "BriefDescription": "Instruction per (near) call",
>>           "MetricGroup": "Branches",
>>           "MetricName": "IpCall"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / cycles",
>>           "MetricGroup": "FLOPS",
>>           "MetricName": "FLOPc"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "FLOPS_SMT",
>>           "MetricName": "FLOPc_SMT"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (( 
>> INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) ) * (4 
>> * cycles) / BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "Branch_Misprediction_Cost"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) 
>> + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> * (( INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 
>> * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> ) * (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / 
>> BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts_SMT",
>> +        "MetricGroup": "BrMispredicts_SMT",
>>           "MetricName": "Branch_Misprediction_Cost_SMT"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>>           "BriefDescription": "Number of Instructions per 
>> non-speculative Branch Misprediction (JEClear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "IpMispredict"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
>>           "BriefDescription": "Actual Average Latency for L1 
>> data-cache miss demand loads (in core cycles)",
>> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
>>           "MetricGroup": "Memory_Bound;Memory_Lat",
>>           "MetricName": "Load_Miss_Real_Latency"
>>       },
>>       {
>> +        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-Logical Processor)",
>>           "MetricExpr": "L1D_PEND_MISS.PENDING / 
>> L1D_PEND_MISS.PENDING_CYCLES",
>> -        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-thread)",
>>           "MetricGroup": "Memory_Bound;Memory_BW",
>>           "MetricName": "MLP"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * cycles )",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * cycles )",
>>           "MetricGroup": "TLB",
>>           "MetricName": "Page_Walks_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )",
>>           "MetricGroup": "TLB_SMT",
>>           "MetricName": "Page_Walks_Utilization_SMT"
>>       },
>>       {
>> -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L1 
>> data cache [GB / sec]",
>> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L1D_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L2 
>> cache [GB / sec]",
>> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L2_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 
>> 1000000000 / duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 
>> 1000000000 / duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Access_BW"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L1 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L1MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache misses per kilo instruction 
>> for all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache hits per kilo instruction for 
>> all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2HPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L3 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L3MPKI"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "BriefDescription": "Giga Floating Point Operations Per 
>> Second",
>> +        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "MetricGroup": "FLOPS;Summary",
>>           "MetricName": "GFLOPs"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "64 * ( arb@...nt\\=0x81\\,umask\\=0x1@ + 
>> arb@...nt\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> -        "MetricExpr": "arb@...nt\\=0x80\\,umask\\=0x2@ / 
>> arb@...nt\\=0x80\\,umask\\=0x2\\,thresh\\=1@",
>>           "BriefDescription": "Average number of parallel data read 
>> requests to external memory. Accounts for demand loads and L1/L2 
>> prefetches",
>> +        "MetricExpr": "arb@...nt\\=0x80\\,umask\\=0x2@ / 
>> arb@...nt\\=0x80\\,umask\\=0x2\\,thresh\\=1@",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_Parallel_Reads"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Far Branch ( Far 
>> Branches apply upon transition from application to operating system, 
>> handling interrupts, exceptions. )",
>> +        "MetricExpr": "INST_RETIRED.ANY / ( 
>> BR_INST_RETIRED.FAR_BRANCH / 2 )",
>> +        "MetricGroup": "",
>> +        "MetricName": "IpFarBranch"
>> +    },
>> +    {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>> diff --git a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json 
>> b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
>> index 35b255fa6a79..047d7e11aa6f 100644
>> --- a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
>> +++ b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
>> @@ -1,376 +1,394 @@
>>   [
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Frontend_Bound"
>> +        "MetricName": "Frontend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound."
>>       },
>>       {
>> -        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Frontend_Bound_SMT"
>> +        "MetricName": "Frontend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where the processor's Frontend undersupplies its Backend. 
>> Frontend denotes the first part of the processor core responsible to 
>> fetch operations that are executed later on by the Backend part. 
>> Within the Frontend; a branch predictor predicts the next address to 
>> fetch; cache-lines are fetched from the memory subsystem; parsed into 
>> instructions; and lastly decoded into micro-ops (uops). Ideally the 
>> Frontend can issue 4 uops every cycle to the Backend. Frontend Bound 
>> denotes unutilized issue-slots when there is no Backend stall; i.e. 
>> bubbles where Frontend delivered no uops while Backend could have 
>> accepted them. For example; stalls due to instruction-cache misses 
>> would be categorized under Frontend Bound. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Bad_Speculation"
>> +        "MetricName": "Bad_Speculation",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example."
>>       },
>>       {
>> -        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS 
>> + 4 * (( INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Bad_Speculation_SMT"
>> +        "MetricName": "Bad_Speculation_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots wasted due to incorrect speculations. This include slots used to 
>> issue uops that do not eventually get retired and slots for which the 
>> issue-pipeline was blocked due to recovery from earlier incorrect 
>> speculation. For example; wasted work due to miss-predicted branches 
>> are categorized under Bad Speculation category. Incorrect data 
>> speculation followed by Memory Ordering Nukes is another example. SMT 
>> version; use when SMT is enabled and measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * 
>> cycles)) + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles)) + 
>> (UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)) )",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Backend_Bound"
>> +        "MetricName": "Backend_Bound",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound."
>>       },
>>       {
>> -        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>> -        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. SMT version; use when 
>> SMT is enabled and measuring per logical CPU.",
>> +        "MetricExpr": "1 - ( (IDQ_UOPS_NOT_DELIVERED.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (( UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> + (UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) 
>> * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK 
>> ) )))) )",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Backend_Bound_SMT"
>> +        "MetricName": "Backend_Bound_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots where no uops are being delivered due to a lack of required 
>> resources for accepting new uops in the Backend. Backend is the 
>> portion of the processor core where the out-of-order scheduler 
>> dispatches ready uops into their respective execution units; and once 
>> completed these uops get retired according to program order. For 
>> example; stalls due to data-cache misses or stalls due to the divider 
>> unit being overloaded are both categorized under Backend Bound. 
>> Backend Bound is further divided into two main categories: Memory 
>> Bound and Core Bound. SMT version; use when SMT is enabled and 
>> measuring per logical CPU."
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. ",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * cycles)",
>>           "MetricGroup": "TopdownL1",
>> -        "MetricName": "Retiring"
>> +        "MetricName": "Retiring",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. "
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>> -        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU.",
>>           "BriefDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. SMT version; use when SMT is enabled and measuring per 
>> logical CPU.",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))",
>>           "MetricGroup": "TopdownL1_SMT",
>> -        "MetricName": "Retiring_SMT"
>> +        "MetricName": "Retiring_SMT",
>> +        "PublicDescription": "This category represents fraction of 
>> slots utilized by useful work i.e. issued uops that eventually get 
>> retired. Ideally; all pipeline slots would be attributed to the 
>> Retiring category.  Retiring of 100% would indicate the maximum 4 uops 
>> retired per cycle has been achieved.  Maximizing Retiring typically 
>> increases the Instruction-Per-Cycle metric. Note that a high Retiring 
>> value does not necessary mean there is no room for more performance.  
>> For example; Microcode assists are categorized under Retiring. They 
>> hurt performance and can often be avoided. SMT version; use when SMT 
>> is enabled and measuring per logical CPU."
>>       },
>>       {
>> +        "BriefDescription": "Instructions Per Cycle (per Logical 
>> Processor)",
>>           "MetricExpr": "INST_RETIRED.ANY / CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Instructions Per Cycle (per logical 
>> thread)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "IPC"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>>           "BriefDescription": "Uops Per Instruction",
>> -        "MetricGroup": "Pipeline;Retiring",
>> +        "MetricExpr": "UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Pipeline;Retire",
>>           "MetricName": "UPI"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Instruction per taken branch",
>> -        "MetricGroup": "Branches;PGO",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
>> +        "MetricGroup": "Branches;Fetch_BW;PGO",
>>           "MetricName": "IpTB"
>>       },
>>       {
>> -        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "BriefDescription": "Branch instructions per taken branch. ",
>> +        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / 
>> BR_INST_RETIRED.NEAR_TAKEN",
>>           "MetricGroup": "Branches;PGO",
>>           "MetricName": "BpTB"
>>       },
>>       {
>> -        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( 
>> ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )",
>>           "BriefDescription": "Rough Estimation of fraction of fetched 
>> lines bytes that were likely (includes speculatively fetches) consumed 
>> by program instructions",
>> -        "MetricGroup": "PGO",
>> +        "MetricExpr": "min( 1 , UOPS_ISSUED.ANY / ( 
>> (UOPS_RETIRED.RETIRE_SLOTS / INST_RETIRED.ANY) * 64 * ( 
>> ICACHE_64B.IFTAG_HIT + ICACHE_64B.IFTAG_MISS ) / 4.1 ) )",
>> +        "MetricGroup": "PGO;IcMiss",
>>           "MetricName": "IFetch_Line_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "IDQ.DSB_UOPS / (( IDQ.DSB_UOPS + IDQ.MITE_UOPS 
>> + IDQ.MS_UOPS ))",
>>           "BriefDescription": "Fraction of Uops delivered by the DSB 
>> (aka Decoded ICache; or Uop Cache)",
>> -        "MetricGroup": "DSB;Frontend_Bandwidth",
>> +        "MetricExpr": "IDQ.DSB_UOPS / (IDQ.DSB_UOPS + IDQ.MITE_UOPS + 
>> IDQ.MS_UOPS)",
>> +        "MetricGroup": "DSB;Fetch_BW",
>>           "MetricName": "DSB_Coverage"
>>       },
>>       {
>> +        "BriefDescription": "Cycles Per Instruction (per Logical 
>> Processor)",
>>           "MetricExpr": "1 / (INST_RETIRED.ANY / cycles)",
>> -        "BriefDescription": "Cycles Per Instruction (threaded)",
>>           "MetricGroup": "Pipeline;Summary",
>>           "MetricName": "CPI"
>>       },
>>       {
>> +        "BriefDescription": "Per-Logical Processor actual clocks when 
>> the Logical Processor is active.",
>>           "MetricExpr": "CPU_CLK_UNHALTED.THREAD",
>> -        "BriefDescription": "Per-thread actual clocks when the 
>> logical processor is active.",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * cycles",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1",
>>           "MetricName": "SLOTS"
>>       },
>>       {
>> +        "BriefDescription": "Total issue-pipeline slots (per-Physical 
>> Core)",
>>           "MetricExpr": "4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 
>> + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>> -        "BriefDescription": "Total issue-pipeline slots (per core)",
>>           "MetricGroup": "TopDownL1_SMT",
>>           "MetricName": "SLOTS_SMT"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Load (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS",
>> -        "BriefDescription": "Instructions per Load (lower number 
>> means loads are more frequent)",
>> -        "MetricGroup": "Instruction_Type;L1_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpL"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Store (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES",
>> -        "BriefDescription": "Instructions per Store",
>> -        "MetricGroup": "Instruction_Type;Store_Bound",
>> +        "MetricGroup": "Instruction_Type",
>>           "MetricName": "IpS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Branch (lower number 
>> means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / 
>> BR_INST_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Instructions per Branch",
>> -        "MetricGroup": "Branches;Instruction_Type;Port_5;Port_6",
>> +        "MetricGroup": "Branches;Instruction_Type",
>>           "MetricName": "IpB"
>>       },
>>       {
>> +        "BriefDescription": "Instruction per (near) call (lower 
>> number means higher occurance rate)",
>>           "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
>> -        "BriefDescription": "Instruction per (near) call",
>>           "MetricGroup": "Branches",
>>           "MetricName": "IpCall"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY",
>>           "BriefDescription": "Total number of retired Instructions",
>> +        "MetricExpr": "INST_RETIRED.ANY",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Instructions"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / cycles",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Instructions Per Cycle (per physical 
>> core)",
>> +        "MetricExpr": "INST_RETIRED.ANY / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CoreIPC_SMT"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / cycles",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / cycles",
>>           "MetricGroup": "FLOPS",
>>           "MetricName": "FLOPc"
>>       },
>>       {
>> -        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "BriefDescription": "Floating Point Operations Per Cycle",
>> +        "MetricExpr": "(( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE + 
>> FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))",
>>           "MetricGroup": "FLOPS_SMT",
>>           "MetricName": "FLOPc_SMT"
>>       },
>>       {
>> -        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1)",
>>           "BriefDescription": "Instruction-Level-Parallelism (average 
>> number of uops executed when there is at least 1 uop executed)",
>> -        "MetricGroup": "Pipeline;Ports_Utilization",
>> +        "MetricExpr": "UOPS_EXECUTED.THREAD / (( 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 ) if #SMT_on else 
>> UOPS_EXECUTED.CORE_CYCLES_GE_1)",
>> +        "MetricGroup": "Pipeline",
>>           "MetricName": "ILP"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * 
>> INT_MISC.RECOVERY_CYCLES ) / (4 * cycles))) + (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) * (( 
>> INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 * 
>> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * cycles)) ) * (4 
>> * cycles) / BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "Branch_Misprediction_Cost"
>>       },
>>       {
>> +        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per non-speculative branch misprediction (jeclear)",
>>           "MetricExpr": "( ((BR_MISP_RETIRED.ALL_BRANCHES / ( 
>> BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT )) * (( 
>> UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (( 
>> INT_MISC.RECOVERY_CYCLES_ANY / 2 )) ) / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))))) 
>> + (4 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> * (( INT_MISC.CLEAR_RESTEER_CYCLES + 9 * BACLEARS.ANY ) / cycles) / (4 
>> * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE / (4 * (( ( 
>> CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )))) 
>> ) * (4 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ))) / 
>> BR_MISP_RETIRED.ALL_BRANCHES",
>> -        "BriefDescription": "Branch Misprediction Cost: Fraction of 
>> TopDown slots wasted per branch misprediction (jeclear and baclear)",
>> -        "MetricGroup": "Branch_Mispredicts_SMT",
>> +        "MetricGroup": "BrMispredicts_SMT",
>>           "MetricName": "Branch_Misprediction_Cost_SMT"
>>       },
>>       {
>> -        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>>           "BriefDescription": "Number of Instructions per 
>> non-speculative Branch Misprediction (JEClear)",
>> -        "MetricGroup": "Branch_Mispredicts",
>> +        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
>> +        "MetricGroup": "BrMispredicts",
>>           "MetricName": "IpMispredict"
>>       },
>>       {
>> +        "BriefDescription": "Core actual clocks when any Logical 
>> Processor is active on the Physical Core",
>>           "MetricExpr": "( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )",
>> -        "BriefDescription": "Core actual clocks when any thread is 
>> active on the physical core",
>>           "MetricGroup": "SMT",
>>           "MetricName": "CORE_CLKS"
>>       },
>>       {
>> -        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
>>           "BriefDescription": "Actual Average Latency for L1 
>> data-cache miss demand loads (in core cycles)",
>> +        "MetricExpr": "L1D_PEND_MISS.PENDING / ( 
>> MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT )",
>>           "MetricGroup": "Memory_Bound;Memory_Lat",
>>           "MetricName": "Load_Miss_Real_Latency"
>>       },
>>       {
>> +        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-Logical Processor)",
>>           "MetricExpr": "L1D_PEND_MISS.PENDING / 
>> L1D_PEND_MISS.PENDING_CYCLES",
>> -        "BriefDescription": "Memory-Level-Parallelism (average number 
>> of L1 miss demand load when there is at least one such miss. 
>> Per-thread)",
>>           "MetricGroup": "Memory_Bound;Memory_BW",
>>           "MetricName": "MLP"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * cycles )",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * cycles )",
>>           "MetricGroup": "TLB",
>>           "MetricName": "Page_Walks_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )",
>>           "BriefDescription": "Utilization of the core's Page 
>> Walker(s) serving STLB misses triggered by instruction/Load/Store 
>> accesses",
>> +        "MetricExpr": "( ITLB_MISSES.WALK_PENDING + 
>> DTLB_LOAD_MISSES.WALK_PENDING + DTLB_STORE_MISSES.WALK_PENDING + 
>> EPT.WALK_PENDING ) / ( 2 * (( ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + 
>> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) )) )",
>>           "MetricGroup": "TLB_SMT",
>>           "MetricName": "Page_Walks_Utilization_SMT"
>>       },
>>       {
>> -        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L1 
>> data cache [GB / sec]",
>> +        "MetricExpr": "64 * L1D.REPLACEMENT / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L1D_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average data fill bandwidth to the L2 
>> cache [GB / sec]",
>> +        "MetricExpr": "64 * L2_LINES_IN.ALL / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L2_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1000000000 / 
>> duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Fill_BW"
>>       },
>>       {
>> -        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 
>> 1000000000 / duration_time",
>>           "BriefDescription": "Average per-core data fill bandwidth to 
>> the L3 cache [GB / sec]",
>> +        "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 
>> 1000000000 / duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "L3_Cache_Access_BW"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L1 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L1_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L1MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L2_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI"
>>       },
>>       {
>> -        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache misses per kilo instruction 
>> for all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * L2_RQSTS.MISS / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2MPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>>           "BriefDescription": "L2 cache hits per kilo instruction for 
>> all request types (including speculative)",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * ( L2_RQSTS.REFERENCES - L2_RQSTS.MISS ) 
>> / INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L2HPKI_All"
>>       },
>>       {
>> -        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>>           "BriefDescription": "L3 cache true misses per kilo 
>> instruction for retired demand loads",
>> -        "MetricGroup": "Cache_Misses;",
>> +        "MetricExpr": "1000 * MEM_LOAD_RETIRED.L3_MISS / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "Cache_Misses",
>>           "MetricName": "L3MPKI"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>> +        "BriefDescription": "Rate of silent evictions from the L2 
>> cache per Kilo instruction where the evicted lines are dropped (no 
>> writeback to L3 or memory)",
>> +        "MetricExpr": "1000 * L2_LINES_OUT.SILENT / INST_RETIRED.ANY",
>> +        "MetricGroup": "",
>> +        "MetricName": "L2_Evictions_Silent_PKI"
>> +    },
>> +    {
>> +        "BriefDescription": "Rate of non silent evictions from the L2 
>> cache per Kilo instruction",
>> +        "MetricExpr": "1000 * L2_LINES_OUT.NON_SILENT / 
>> INST_RETIRED.ANY",
>> +        "MetricGroup": "",
>> +        "MetricName": "L2_Evictions_NonSilent_PKI"
>> +    },
>> +    {
>>           "BriefDescription": "Average CPU Utilization",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / msr@tsc@",
>>           "MetricGroup": "Summary",
>>           "MetricName": "CPU_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "BriefDescription": "Giga Floating Point Operations Per 
>> Second",
>> +        "MetricExpr": "( (( 1 * ( FP_ARITH_INST_RETIRED.SCALAR_SINGLE 
>> + FP_ARITH_INST_RETIRED.SCALAR_DOUBLE ) + 2 * 
>> FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE + 4 * ( 
>> FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE ) + 8 * ( 
>> FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE + 
>> FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE ) + 16 * 
>> FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE )) / 1000000000 ) / 
>> duration_time",
>>           "MetricGroup": "FLOPS;Summary",
>>           "MetricName": "GFLOPs"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Average Frequency Utilization relative 
>> nominal frequency",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.THREAD / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Power",
>>           "MetricName": "Turbo_Utilization"
>>       },
>>       {
>> +        "BriefDescription": "Fraction of cycles where both hardware 
>> Logical Processors were active",
>>           "MetricExpr": "1 - CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE 
>> / ( CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY / 2 ) if #SMT_on else 0",
>> -        "BriefDescription": "Fraction of cycles where both hardware 
>> threads were active",
>>           "MetricGroup": "SMT;Summary",
>>           "MetricName": "SMT_2T_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "BriefDescription": "Fraction of cycles spent in Kernel mode",
>> +        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC:u / 
>> CPU_CLK_UNHALTED.REF_TSC",
>>           "MetricGroup": "Summary",
>>           "MetricName": "Kernel_Utilization"
>>       },
>>       {
>> -        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "BriefDescription": "Average external Memory Bandwidth Use 
>> for reads and writes [GB / sec]",
>> +        "MetricExpr": "( 64 * ( uncore_imc@..._count_read@ + 
>> uncore_imc@..._count_write@ ) / 1000000000 ) / duration_time",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_BW_Use"
>>       },
>>       {
>> -    "MetricExpr": "1000000000 * ( 
>> cha@...nt\\=0x36\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ / 
>> cha@...nt\\=0x35\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ ) / ( 
>> cha_0@...nt\\=0x0@ / duration_time )",
>>           "BriefDescription": "Average latency of data read request to 
>> external memory (in nanoseconds). Accounts for demand loads and L1/L2 
>> prefetches",
>> +        "MetricExpr": "1000000000 * ( 
>> cha@...nt\\=0x36\\\\\\,umask\\=0x21@ / 
>> cha@...nt\\=0x35\\\\\\,umask\\=0x21@ ) / ( cha_0@...nt\\=0x0@ / 
>> duration_time )",
>>           "MetricGroup": "Memory_Lat",
>>           "MetricName": "DRAM_Read_Latency"
>>       },
>>       {
>> -    "MetricExpr": 
>> "cha@...nt\\=0x36\\\\\\,umask\\=0x21\\\\\\,config\\=0x40433@ / 
>> cha@...nt\\=0x36\\\\\\,umask\\=0x21\\\\\\,thresh\\=1\\\\\\,config\\=0x40433@", 
>>
>>           "BriefDescription": "Average number of parallel data read 
>> requests to external memory. Accounts for demand loads and L1/L2 
>> prefetches",
>> +        "MetricExpr": "cha@...nt\\=0x36\\\\\\,umask\\=0x21@ / 
>> cha@...nt\\=0x36\\\\\\,umask\\=0x21\\\\\\,thresh\\=1@",
>>           "MetricGroup": "Memory_BW",
>>           "MetricName": "DRAM_Parallel_Reads"
>>       },
>>       {
>> -        "MetricExpr": "cha_0@...nt\\=0x0@",
>>           "BriefDescription": "Socket actual clocks when any core is 
>> active on that socket",
>> +        "MetricExpr": "cha_0@...nt\\=0x0@",
>>           "MetricGroup": "",
>>           "MetricName": "Socket_CLKS"
>>       },
>>       {
>> +        "BriefDescription": "Instructions per Far Branch ( Far 
>> Branches apply upon transition from application to operating system, 
>> handling interrupts, exceptions. )",
>> +        "MetricExpr": "INST_RETIRED.ANY / ( 
>> BR_INST_RETIRED.FAR_BRANCH / 2 )",
>> +        "MetricGroup": "",
>> +        "MetricName": "IpFarBranch"
>> +    },
>> +    {
>> +        "BriefDescription": "C3 residency percent per core",
>>           "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per core",
>>           "MetricName": "C3_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per core",
>>           "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per core",
>>           "MetricName": "C6_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per core",
>>           "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per core",
>>           "MetricName": "C7_Core_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C2 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C2 residency percent per package",
>>           "MetricName": "C2_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C3 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C3 residency percent per package",
>>           "MetricName": "C3_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C6 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C6 residency percent per package",
>>           "MetricName": "C6_Pkg_Residency"
>>       },
>>       {
>> +        "BriefDescription": "C7 residency percent per package",
>>           "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
>>           "MetricGroup": "Power",
>> -        "BriefDescription": "C7 residency percent per package",
>>           "MetricName": "C7_Pkg_Residency"
>>       }
>>   ]
>>