linux-kernel - [RFC] perf script AMD/IBS: Add scripts to show function/instruction level granular profile

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250124060638.905-1-ravi.bangoria@amd.com>
Date: Fri, 24 Jan 2025 06:06:38 +0000
From: Ravi Bangoria <ravi.bangoria@....com>
To: <acme@...nel.org>, <namhyung@...nel.org>
CC: <ravi.bangoria@....com>, <peterz@...radead.org>, <mingo@...hat.com>,
	<eranian@...gle.com>, <irogers@...gle.com>, <kan.liang@...ux.intel.com>,
	<jolsa@...nel.org>, <adrian.hunter@...el.com>,
	<alexander.shishkin@...ux.intel.com>, <bp@...en8.de>, <mark.rutland@....com>,
	<linux-kernel@...r.kernel.org>, <linux-perf-users@...r.kernel.org>,
	<santosh.shukla@....com>, <ananth.narayan@....com>, <sandipan.das@....com>
Subject: [RFC] perf script AMD/IBS: Add scripts to show function/instruction level granular profile

AMD IBS (Instruction Based Sampling) PMUs provides various insights
about instruction execution through front-end and back-end units.
Various perf tools (e.g. precise-mode (:p), perf-mem, perf-c2c etc.)
uses portion of these information but lot of other insightful data are
still remains unused by perf. I could not think of any generic perf
tool where I can consolidate and show all these data, so thought to
add perf-python scripts.

1) amd-ibs-op-metrics.py: Print various back-end metric events at
   function granularity using AMD IBS Op PMU.
2) amd-ibs-op-metrics-annotate.py: Print various back-end metric events
   at instruction granularity using AMD IBS Op PMU.
3) amd-ibs-fetch-metrics.py: Print various front-end metric events at
   function granularity using AMD IBS Fetch PMU.
   (Annotate script can be added for Fetch PMU as well).

This is still early prototype and thus lot of rough edges. Please feel
free to report bugs/enhancements if you find these to be useful.

Example usage:

IBS Op:

  # perf record -a -e ibs_op// -c 1000000 --raw-sample -- make
  [ perf record: Woken up 91 times to write data ]
  [ perf record: Captured and wrote 49.926 MB perf.data (386979 samples) ]

  # perf script -s amd-ibs-op-metrics.py -- --sort=dc_miss,l2_miss | head -15
  Sort Order: dc_miss,l2_miss
  Percentages: Cache miss and TLB miss %es are wrt NrLdSt not NrSamples
                                               |      Nr |      Nr                                                          90th     Avg |  L1Dtlb            L2Dtlb              90th     Avg |          Branch           |
  function                                     | Samples |    LdSt  DcMiss       (%)  L2Miss       (%)  L3Miss       (%)  PctLat     Lat |    Miss       (%)    Miss       (%)  PctLat     Lat |    Miss/Retired       (%) | dso
  --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  clear_page_erms [K]                          |    6704 |    6059    4767 ( 78.68%)    4085 ( 67.42%)    4027 ( 66.46%)       0       0 |      13 (  0.21%)       4 (  0.07%)      76      80 |       0/5       (  0.00%) | [kernel.kallsyms]
  __memmove_avx512_unaligned_erms [U]          |    6274 |    2461    1298 ( 52.74%)    1099 ( 44.66%)     725 ( 29.46%)     465     265 |     996 ( 40.47%)     668 ( 27.14%)     137      88 |      53/2032    (  2.61%) | /usr/lib/x86_64-linux-gnu/libc.so.6
  __memset_avx512_unaligned_erms [U]           |    2759 |    1343     664 ( 49.44%)     345 ( 25.69%)     143 ( 10.65%)       0       0 |     122 (  9.08%)      20 (  1.49%)      94      44 |      20/317     (  6.31%) | /usr/lib/x86_64-linux-gnu/libc.so.6
  _copy_to_iter [K]                            |     918 |     640     351 ( 54.84%)     231 ( 36.09%)     163 ( 25.47%)    1341     391 |      13 (  2.03%)       5 (  0.78%)    1567     369 |       0/3       (  0.00%) | [kernel.kallsyms]
  pop_scope [U]                                |    1648 |     960     302 ( 31.46%)     258 ( 26.88%)     224 ( 23.33%)    1515     493 |      59 (  6.15%)      15 (  1.56%)     782     205 |       6/534     (  1.12%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
  memset [K]                                   |     776 |     505     185 ( 36.63%)      61 ( 12.08%)      46 (  9.11%)       0       0 |       3 (  0.59%)       2 (  0.40%)    4985    2200 |       0/9       (  0.00%) | [kernel.kallsyms]
  _int_malloc [U]                              |    4534 |    1523     178 ( 11.69%)      43 (  2.82%)       6 (  0.39%)      40      25 |      88 (  5.78%)      12 (  0.79%)      84      42 |     103/1141    (  9.03%) | /usr/lib/x86_64-linux-gnu/libc.so.6
  ggc_internal_alloc [U]                       |    2891 |    1254     138 ( 11.00%)      78 (  6.22%)      45 (  3.59%)     905     267 |      80 (  6.38%)       1 (  0.08%)      10      17 |      16/448     (  3.57%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
  native_queued_spin_lock_slowpath [K]         |   36544 |   17736     125 (  0.70%)     124 (  0.70%)     115 (  0.65%)     695     390 |       0 (  0.00%)       0 (  0.00%)       0       0 |      18/17327   (  0.10%) | [kernel.kallsyms]
  get_mem_cgroup_from_mm [K]                   |     985 |     341     122 ( 35.78%)       9 (  2.64%)       1 (  0.29%)      23      19 |      74 ( 21.70%)       0 (  0.00%)       7       7 |       0/297     (  0.00%) | [kernel.kallsyms]

  o Default sort order is Nr Samples.
  o Cache misses and TLB misses percentages are wrt Nr LdSt. Branch
    miss percentages are wrt branches retired.
  o Use --help for more detail.

IBS Op Annotate:

  # perf script -s amd-ibs-op-metrics-annotate.py -- --dso=/home/ravi/linux/vmlinux --symbol=clear_page_erms
                                                   |      Nr |                                                                  90th     Avg |  L1Dtlb            L2Dtlb              90th     Avg |          Branch
  Disassembly                                      | Samples |    LdSt  DcMiss       (%)  L2Miss       (%)  L3Miss       (%)  PctLat     Lat |    Miss       (%)    Miss       (%)  PctLat     Lat |    Miss/Retired       (%)
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  ffffffff821d3e10: mov    $0x1000,%ecx            |       6 |       0       0 (  0.00%)       0 (  0.00%)       0 (  0.00%)       0       0 |       0 (  0.00%)       0 (  0.00%)       0       0 |       0/0       (  0.00%)
  ffffffff821d3e15: xor    %eax,%eax               |       4 |       0       0 (  0.00%)       0 (  0.00%)       0 (  0.00%)       0       0 |       0 (  0.00%)       0 (  0.00%)       0       0 |       0/0       (  0.00%)
  ffffffff821d3e17: rep stos %al,%es:(%rdi)        |    6687 |    6059    4767 ( 78.68%)    4085 ( 67.42%)    4027 ( 66.46%)       0       0 |      13 (  0.21%)       4 (  0.07%)      76      80 |       0/0       (  0.00%)
  ffffffff821d3e19: jmp    ffffffff821f27a0        |       7 |       0       0 (  0.00%)       0 (  0.00%)       0 (  0.00%)       0       0 |       0 (  0.00%)       0 (  0.00%)       0       0 |       0/5       (  0.00%)
  Percentages: Cache miss and TLB miss %es are wrt NrLdSt not NrSamples
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  o Actual disassembly of the function, so data are not sorted.
  o Cache misses and TLB misses percentages are wrt Nr LdSt. Branch
    miss percentages are wrt branches retired.

IBS Fetch:

  # perf record -a -e ibs_fetch// -c 1000000 --raw-sample -- make
  [ perf record: Woken up 4 times to write data ]
  [ perf record: Captured and wrote 15.051 MB perf.data (112595 samples) ]

  # perf script -s amd-ibs-fetch-metrics.py -- --sort=ic_miss | head -15
  Sort Order: ic_miss
                                               |      Nr |                                                                            90th     Avg |   Fetch           |  L1Itlb            L2Itlb           |
  function                                     | Samples |  OcMiss       (%)  IcMiss       (%)  L2Miss       (%)  L3Miss       (%)  PctLat     Lat |   Abort       (%) |    Miss       (%)    Miss       (%) | dso
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  _int_malloc [U]                              |    1379 |     407 ( 29.51%)     130 (  9.43%)       1 (  0.07%)       0 (  0.00%)      20      14 |       0 (  0.00%) |      11 (  0.80%)       5 (  0.36%) | /usr/lib/x86_64-linux-gnu/libc.so.6
  _cpp_lex_direct [U]                          |    1621 |     133 (  8.20%)      35 (  2.16%)       1 (  0.06%)       0 (  0.00%)      26      16 |       0 (  0.00%) |       1 (  0.06%)       1 (  0.06%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
  mas_walk [K]                                 |     115 |      75 ( 65.22%)      33 ( 28.70%)       0 (  0.00%)       0 (  0.00%)      20      14 |       0 (  0.00%) |       0 (  0.00%)       0 (  0.00%) | [kernel.kallsyms]
  _int_free [U]                                |     598 |      83 ( 13.88%)      32 (  5.35%)       0 (  0.00%)       0 (  0.00%)      17      13 |       0 (  0.00%) |       5 (  0.84%)       3 (  0.50%) | /usr/lib/x86_64-linux-gnu/libc.so.6
  __libc_calloc [U]                            |     202 |      72 ( 35.64%)      31 ( 15.35%)       0 (  0.00%)       0 (  0.00%)      24      27 |       0 (  0.00%) |      10 (  4.95%)       6 (  2.97%) | /usr/lib/x86_64-linux-gnu/libc.so.6
  ggc_internal_alloc [U]                       |     516 |     102 ( 19.77%)      29 (  5.62%)       0 (  0.00%)       0 (  0.00%)      19      14 |       0 (  0.00%) |       6 (  1.16%)       4 (  0.78%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
  _int_free_merge_chunk [U]                    |     219 |      58 ( 26.48%)      29 ( 13.24%)       0 (  0.00%)       0 (  0.00%)      18      14 |       0 (  0.00%) |       4 (  1.83%)       0 (  0.00%) | /usr/lib/x86_64-linux-gnu/libc.so.6
  get_page_from_freelist [K]                   |      68 |      45 ( 66.18%)      28 ( 41.18%)       1 (  1.47%)       0 (  0.00%)      27      23 |       0 (  0.00%) |       0 (  0.00%)       0 (  0.00%) | [kernel.kallsyms]
  __handle_mm_fault [K]                        |      70 |      43 ( 61.43%)      26 ( 37.14%)       2 (  2.86%)       0 (  0.00%)      17      15 |       0 (  0.00%) |       0 (  0.00%)       0 (  0.00%) | [kernel.kallsyms]
  operand_compare::operand_equal_p [U]         |     364 |      82 ( 22.53%)      26 (  7.14%)       1 (  0.27%)       0 (  0.00%)      18      14 |       0 (  0.00%) |       8 (  2.20%)       6 (  1.65%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
  bitmap_set_bit [U]                           |    1917 |      81 (  4.23%)      25 (  1.30%)       0 (  0.00%)       0 (  0.00%)      23      15 |       0 (  0.00%) |      10 (  0.52%)       8 (  0.42%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1

  o Default sort order is Nr Samples.
  o All percentages are wrt Nr Samples.
  o Use --help for more detail.

Signed-off-by: Ravi Bangoria <ravi.bangoria@....com>
---
 .../scripts/python/amd-ibs-fetch-metrics.py   | 219 +++++++++++
 .../python/amd-ibs-op-metrics-annotate.py     | 342 ++++++++++++++++++
 .../perf/scripts/python/amd-ibs-op-metrics.py | 285 +++++++++++++++
 3 files changed, 846 insertions(+)
 create mode 100644 tools/perf/scripts/python/amd-ibs-fetch-metrics.py
 create mode 100644 tools/perf/scripts/python/amd-ibs-op-metrics-annotate.py
 create mode 100644 tools/perf/scripts/python/amd-ibs-op-metrics.py

diff --git a/tools/perf/scripts/python/amd-ibs-fetch-metrics.py b/tools/perf/scripts/python/amd-ibs-fetch-metrics.py
new file mode 100644
index 000000000000..63a91843585f
--- /dev/null
+++ b/tools/perf/scripts/python/amd-ibs-fetch-metrics.py
@@ -0,0 +1,219 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2025 Advanced Micro Devices, Inc.
+#
+# Print various metric events at function granularity using AMD IBS Fetch PMU.
+
+from __future__ import print_function
+
+import os
+import sys
+import re
+import numpy as np
+from optparse import OptionParser, make_option
+
+# To avoid BrokenPipeError when redirecting output to head/less etc.
+from signal import signal, SIGPIPE, SIG_DFL
+signal(SIGPIPE,SIG_DFL)
+
+# IBS FETCH CTL bit positions
+IBS_FETCH_CTL_FETCH_LAT_SHIFT       = 32
+IBS_FETCH_CTL_IC_MISS_SHIFT         = 51
+IBS_FETCH_CTL_L1_ITLB_MISS_SHIFT    = 55
+IBS_FETCH_CTL_L2_ITLB_MISS_SHIFT    = 56
+IBS_FETCH_CTL_L2_MISS_SHIFT         = 58
+IBS_FETCH_CTL_OC_MISS_SHIFT         = 60
+IBS_FETCH_CTL_L3_MISS_SHIFT         = 61
+IBS_FETCH_CTL_FETCH_COMP            = 50
+
+allowed_sort_keys = ("nr_samples", "oc_miss", "ic_miss", "l2_miss", "l3_miss", "abort", "l1_itlb_miss", "l2_itlb_miss")
+default_sort_order = ("nr_samples",) # Trailing comman is needed for single member tuple
+sort_order = default_sort_order
+options = None
+
+def parse_cmdline_options():
+    global sort_order
+    global options
+
+    option_list = [
+        make_option("-s", "--sort", dest="sort",
+                    help="Comma separated custom sort order. Allowed values: " +
+                         ", ".join(allowed_sort_keys))
+    ]
+
+    parser = OptionParser(option_list=option_list)
+    (options, args) = parser.parse_args()
+
+    if (options.sort):
+        sort_err = 0
+        temp = []
+        for sort_option in options.sort.split(","):
+            if sort_option not in allowed_sort_keys:
+                print("ERROR: Invalid sort option: %s" % sort_option)
+                print("       Falling back to default sort order.")
+                sort_err = 1
+                break
+            else:
+                temp.append(sort_option)
+
+        if (sort_err == 0):
+            sort_order = tuple(temp)
+
+parse_cmdline_options()
+
+data = {};
+
+def init_data_element(symbol, cpumode, dso):
+    # XXX: Should the key be dso:symbol ?
+    data[symbol] = {
+        'nr_samples': 0,
+        'cpumode': cpumode,
+
+        'oc_miss': 0,
+        'ic_miss': 0,
+        'l2_miss': 0,
+        'l3_miss': 0,
+        'lat': [],
+
+        'abort': 0,
+
+        'l1_itlb_miss': 0,
+        'l2_itlb_miss': 0,
+
+        # Misc data
+        'dso': dso,
+    }
+
+def get_cpumode(cpumode):
+    if (cpumode == 1):
+        return 'K'
+    if (cpumode == 2):
+        return 'U'
+    if (cpumode == 3):
+        return 'H'
+    if (cpumode == 4):
+        return 'GK'
+    if (cpumode == 5):
+        return 'GU'
+    return '?'
+
+def is_oc_miss(fetch_ctl):
+    return (fetch_ctl >> IBS_FETCH_CTL_OC_MISS_SHIFT) & 0x1
+
+def is_ic_miss(fetch_ctl):
+    return (fetch_ctl >> IBS_FETCH_CTL_IC_MISS_SHIFT) & 0x1
+
+def is_l2_miss(fetch_ctl):
+    return ((fetch_ctl >> IBS_FETCH_CTL_L2_MISS_SHIFT) & 0x1 and
+            (fetch_ctl >> IBS_FETCH_CTL_FETCH_COMP) & 0x1)
+
+def is_l3_miss(fetch_ctl):
+    return (fetch_ctl >> IBS_FETCH_CTL_L3_MISS_SHIFT) & 0x1
+
+def get_fetch_lat(fetch_ctl):
+    return (fetch_ctl >> IBS_FETCH_CTL_FETCH_LAT_SHIFT) & 0xffff
+
+def is_l1_itlb_miss(fetch_ctl):
+    return (fetch_ctl >> IBS_FETCH_CTL_L1_ITLB_MISS_SHIFT) & 0x1
+
+def is_l2_itlb_miss(fetch_ctl):
+    return (fetch_ctl >> IBS_FETCH_CTL_L2_ITLB_MISS_SHIFT) & 0x1
+
+def is_comp(fetch_ctl):
+    return (fetch_ctl >> IBS_FETCH_CTL_FETCH_COMP) & 0x1
+
+def process_event(param_dict):
+    raw_buf = param_dict['raw_buf']
+    fetch_ctl = int.from_bytes(raw_buf[4:12], "little")
+
+    if ('symbol' in param_dict):
+        symbol = param_dict['symbol']
+        symbol = re.sub(r'\(.*\)', '', symbol)
+    else:
+        symbol = hex(param_dict['sample']['ip'])
+
+    if (symbol not in data):
+        init_data_element(symbol, get_cpumode(param_dict['sample']['cpumode']),
+                          param_dict['dso'] if 'dso' in param_dict else "")
+
+    data[symbol]['nr_samples'] += 1
+
+    if (is_oc_miss(fetch_ctl)):
+        data[symbol]['oc_miss'] += 1
+        if (is_ic_miss(fetch_ctl)):
+            data[symbol]['ic_miss'] += 1
+            latency = get_fetch_lat(fetch_ctl)
+            data[symbol]['lat'].append(latency)
+            if (is_l2_miss(fetch_ctl)):
+                data[symbol]['l2_miss'] += 1
+                if (is_l3_miss(fetch_ctl)):
+                    data[symbol]['l3_miss'] += 1
+
+    if (is_l1_itlb_miss(fetch_ctl)):
+        data[symbol]['l1_itlb_miss'] += 1
+        if (is_l2_itlb_miss(fetch_ctl)):
+            data[symbol]['l2_itlb_miss'] += 1
+
+    if (is_comp(fetch_ctl) == 0):
+        data[symbol]['abort'] += 1
+
+def print_sort_order():
+    global sort_order
+    print("Sort Order: " + ",".join(sort_order))
+
+def print_header():
+    print_sort_order()
+    print("%-45s| %7s | %7s %9s %7s %9s %7s %9s %7s %9s %7s %7s | %7s %9s | %7s %9s %7s %9s | %s" %
+          ("","Nr", "", "", "", "", "", "", "", "", "90th", "Avg", "Fetch", "", "L1Itlb", "", "L2Itlb", "", ""))
+    print("%-45s| %7s | %7s %9s %7s %9s %7s %9s %7s %9s %7s %7s | %7s %9s | %7s %9s %7s %9s | %s" %
+          ("function", "Samples", "OcMiss", "(%)", "IcMiss", "(%)", "L2Miss", "(%)",
+           "L3Miss", "(%)", "PctLat", "Lat", "Abort", "(%)", "Miss", "(%)", "Miss", "(%)", "dso"))
+    print("-----------------------------------------------------------------------------"
+          "-----------------------------------------------------------------------------"
+          "------------------------------------------------------------------")
+
+def print_footer():
+    print("-----------------------------------------------------------------------------"
+          "-----------------------------------------------------------------------------"
+          "------------------------------------------------------------------")
+    print()
+
+def sort_fun(item):
+    global sort_order
+
+    temp = []
+    for sort_option in sort_order:
+        temp.append(item[1][sort_option])
+    return tuple(temp)
+
+def trace_end():
+    sorted_data = sorted(data.items(), key = sort_fun, reverse = True)
+
+    print_header()
+
+    for d in sorted_data:
+        symbol_cpumode = d[0] + " [" + d[1]['cpumode'] + "]"
+
+        oc_miss_perc = (d[1]['oc_miss'] * 100) / float(d[1]['nr_samples'])
+        ic_miss_perc = (d[1]['ic_miss'] * 100) / float(d[1]['nr_samples'])
+        l2_miss_perc = (d[1]['l2_miss'] * 100) / float(d[1]['nr_samples'])
+        l3_miss_perc = (d[1]['l3_miss'] * 100) / float(d[1]['nr_samples'])
+        abort_perc = (d[1]['abort'] * 100) / float(d[1]['nr_samples'])
+        l1_itlb_miss_perc = (d[1]['l1_itlb_miss'] * 100) / float(d[1]['nr_samples'])
+        l2_itlb_miss_perc = (d[1]['l2_itlb_miss'] * 100) / float(d[1]['nr_samples'])
+
+        avg_lat = 0
+        pct_lat = 0
+        if (d[1]['lat']):
+            avg_lat = sum(d[1]['lat']) / float(len(d[1]['lat']))
+            pct_lat = np.percentile(d[1]['lat'], 90)
+
+        print("%-45s| %7d | %7d (%6.2f%%) %7d (%6.2f%%) %7d (%6.2f%%) %7d (%6.2f%%)"
+              " %7d %7d | %7d (%6.2f%%) | %7d (%6.2f%%) %7d (%6.2f%%) | %s" %
+              (symbol_cpumode, d[1]['nr_samples'], d[1]['oc_miss'], oc_miss_perc,
+               d[1]['ic_miss'], ic_miss_perc, d[1]['l2_miss'], l2_miss_perc,
+               d[1]['l3_miss'], l3_miss_perc, pct_lat, avg_lat, d[1]['abort'],
+               abort_perc, d[1]['l1_itlb_miss'], l1_itlb_miss_perc,
+               d[1]['l2_itlb_miss'], l2_itlb_miss_perc, d[1]['dso']))
+
+    print_footer()
diff --git a/tools/perf/scripts/python/amd-ibs-op-metrics-annotate.py b/tools/perf/scripts/python/amd-ibs-op-metrics-annotate.py
new file mode 100644
index 000000000000..beef6a302258
--- /dev/null
+++ b/tools/perf/scripts/python/amd-ibs-op-metrics-annotate.py
@@ -0,0 +1,342 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2025 Advanced Micro Devices, Inc.
+#
+# Print various metric events at instruction granularity using AMD IBS Op PMU.
+
+from __future__ import print_function
+
+import os
+import sys
+import re
+import numpy as np
+from optparse import OptionParser, make_option
+import subprocess
+
+# To avoid BrokenPipeError when redirecting output to head/less etc.
+from signal import signal, SIGPIPE, SIG_DFL
+signal(SIGPIPE,SIG_DFL)
+
+# IBS OP DATA bit positions
+IBS_OPDATA_BR_TAKEN_SHIFT       = 35
+IBS_OPDATA_BR_MISS_SHIFT        = 36
+IBS_OPDATA_BR_RET_SHIFT         = 37
+
+# IBS OP DATA2 bit positions
+IBS_OPDATA2_DATA_SRC_LOW_SHIFT  = 0
+IBS_OPDATA2_DATA_SRC_HIGH_SHIFT = 6
+
+# IBS OP DATA3 bit positions
+IBS_OPDATA3_LDOP_SHIFT          = 0
+IBS_OPDATA3_STOP_SHIFT          = 1
+IBS_OPDATA3_L1_DTLB_MISS_SHIFT  = 2
+IBS_OPDATA3_L2_DTLB_MISS_SHIFT  = 3
+IBS_OPDATA3_DC_MISS_SHIFT       = 7
+IBS_OPDATA3_L2_MISS_SHIFT       = 20
+IBS_OPDATA3_DC_MISS_LAT_SHIFT   = 32
+IBS_OPDATA3_PHYADDR_VAL_SHIFT   = 18
+IBS_OPDATA3_DTLB_MISS_LAT_SHIFT = 48
+
+INSN_SIZE_INVAL = -1
+
+annotate_symbol = None
+annodate_dso = None
+
+#total_samples = 0
+data = []
+
+def parse_cmdline_options():
+    global annotate_symbol
+    global annodate_dso
+    global sort_order
+    global options
+
+    option_list = [
+        make_option("-d", "--dso", dest="dso",
+                    help="Path of binary or a library the symbol belongs to"),
+        make_option("-s", "--symbol", dest="symbol",
+                    help="Symbol name")
+    ]
+
+    parser = OptionParser(option_list=option_list)
+    (options, args) = parser.parse_args()
+
+    if (options.dso):
+        annodate_dso = options.dso
+    else:
+        print("Error: Invalid dso path.\n")
+        exit()
+
+    if (options.symbol):
+        annotate_symbol = options.symbol
+    else:
+        print("Error: Invalid symbol.\n")
+        exit()
+
+def disassemble_symbol(symbol, dso):
+    global data
+
+    readelf = subprocess.Popen(["readelf", "-WsC", "--sym-base=16", dso],
+                               stdout=subprocess.PIPE, text=True)
+    grep = subprocess.Popen(["grep", "-w", symbol], stdin=readelf.stdout,
+                            stdout=subprocess.PIPE, text=True)
+    output, error = grep.communicate()
+
+    if (error != None):
+        print("Error reading symbol table data for '%s'" % (symbol))
+        exit()
+
+    match = re.search(r'([^\s]+):\s([^\s]+)\s([^\s]+)\s([^\s]+)\s+([^\s]+)\s([^\s]+)\s+([^\s]+)\s([^\s]+)', output)
+    if (match == None):
+        print("Can not find start address / size of '%s'" % (symbol))
+        exit()
+
+    start_addr = int(match.group(2), 16)
+    size = int(match.group(3), 16)
+    stop_addr = start_addr + size
+
+    objdump = subprocess.run(["objdump", "-d", "-C", "--no-show-raw-insn",
+                              "--start-address", hex(start_addr), "--stop-address",
+                              hex(stop_addr), dso], capture_output = True, text = True)
+    if (objdump.returncode == 1):
+        print("Error dissassembling '%s'" % (symbol))
+        exit()
+
+    disasm = objdump.stdout.split("\n")
+
+    header_lines = 1
+    # hex(<number>) will convert <number> to hex with 0x prefix. But objdump
+    # addresses skips 0x, so use alternative format(<number>, 'x') which
+    # converts <number> to hex without 0x prefix.
+    start_addr_regex = r"^\s*" + format(start_addr, 'x') + r":"
+    idx = 0;
+    for line in disasm:
+        if (header_lines and (not re.match(start_addr_regex, line))):
+            continue
+        header_lines = 0
+
+        match = re.search(r'\s*([^:]+):[\t\s]+(.*)', line)
+        if (match == None):
+            continue
+
+        addr = int(match.group(1), 16)
+        offset = addr - start_addr
+        insn = re.sub(r'(<.*>)|(\s+#.*)|(\s+$)', '', match.group(2))
+
+        data.append({
+            'addr': addr,
+            'insn_size': INSN_SIZE_INVAL,
+            'symoff': offset,
+            'insn': insn,
+
+            'nr_samples': 0,
+
+            # Branch data
+            'br_ret': 0,
+            'br_miss': 0,
+            'br_taken': 0,
+            'br_fallth': 0,
+
+            # Load / Store data
+            'ld_cnt': 0, # LdOp=1 && StOp=1 are only added int ld_cnt
+            'st_cnt': 0,
+            'dc_miss': 0,
+            'l2_miss': 0,
+            'l3_miss': 0,
+            # XXX: Breakdown beyond L3 ?
+            'dc_miss_lat': [],
+
+            'l1_dtlb_miss': 0,
+            'l2_dtlb_miss': 0,
+            'dtlb_miss_lat': [],
+        })
+
+        if (idx > 0):
+            data[idx - 1]['insn_size'] = (data[idx]['addr'] -
+                                          data[idx - 1]['addr']);
+        idx += 1
+
+parse_cmdline_options()
+disassemble_symbol(annotate_symbol, annodate_dso)
+
+def get_cpumode(cpumode):
+    if (cpumode == 1):
+        return 'K'
+    if (cpumode == 2):
+        return 'U'
+    if (cpumode == 3):
+        return 'H'
+    if (cpumode == 4):
+        return 'GK'
+    if (cpumode == 5):
+        return 'GU'
+    return '?'
+
+def is_br_ret(op_data):
+    return (op_data >> IBS_OPDATA_BR_RET_SHIFT) & 0x1
+
+def is_br_miss(op_data):
+    return (op_data >> IBS_OPDATA_BR_MISS_SHIFT) & 0x1
+
+def is_br_taken(op_data):
+    return (op_data >> IBS_OPDATA_BR_TAKEN_SHIFT) & 0x1
+
+def is_ld_op(op_data3):
+    return (op_data3 >> IBS_OPDATA3_LDOP_SHIFT) & 0x1
+
+def is_st_op(op_data3):
+    return (op_data3 >> IBS_OPDATA3_STOP_SHIFT) & 0x1
+
+def is_dc_miss(op_data3):
+    return (op_data3 >> IBS_OPDATA3_DC_MISS_SHIFT) & 0x1
+
+def get_dc_miss_lat(op_data3):
+    return (op_data3 >> IBS_OPDATA3_DC_MISS_LAT_SHIFT) & 0xffff
+
+def is_l2_miss(op_data3):
+    return (op_data3 >> IBS_OPDATA3_L2_MISS_SHIFT) & 0x1
+
+def get_data_src(op_data2):
+    data_src_high = (op_data2 >> IBS_OPDATA2_DATA_SRC_HIGH_SHIFT) & 0x3
+    data_src_low = (op_data2 >> IBS_OPDATA2_DATA_SRC_LOW_SHIFT) & 0x7
+    return (data_src_high << 3) | data_src_low
+
+def is_phy_addr_val(op_data3):
+    return (op_data3 >> IBS_OPDATA3_PHYADDR_VAL_SHIFT) & 0x1
+
+def is_l1_dtlb_miss(op_data3):
+    return (op_data3 >> IBS_OPDATA3_L1_DTLB_MISS_SHIFT) & 0x1
+
+def get_dtlb_miss_lat(op_data3):
+    return (op_data3 >> IBS_OPDATA3_DTLB_MISS_LAT_SHIFT) & 0xffff
+
+def is_l2_dtlb_miss(op_data3):
+    return (op_data3 >> IBS_OPDATA3_L2_DTLB_MISS_SHIFT) & 0x1
+
+def process_event(param_dict):
+    global data
+
+    raw_buf = param_dict['raw_buf']
+    op_data = int.from_bytes(raw_buf[20:28], "little")
+    op_data2 = int.from_bytes(raw_buf[28:36], "little")
+    op_data3 = int.from_bytes(raw_buf[36:44], "little")
+
+    if ('symbol' not in param_dict):
+        return
+
+    symbol = param_dict['symbol']
+    symbol = re.sub(r'\(.*\)', '', symbol)
+
+    if (symbol != annotate_symbol):
+        return
+
+    symoff = 0
+    if ('symoff' in param_dict):
+        symoff = param_dict['symoff']
+
+    idx = 0
+    for d in data:
+        if (d['symoff'] <= symoff and
+            (d['insn_size'] == INSN_SIZE_INVAL or
+             d['symoff'] + d['insn_size'] > symoff)):
+            break
+        else:
+            idx += 1
+
+    d = data[idx]
+
+    d['nr_samples'] += 1
+    #total_samples += 1
+
+    if (is_br_ret(op_data)):
+        d['br_ret'] += 1
+        if (is_br_miss(op_data)):
+            d['br_miss'] += 1
+        if (is_br_taken(op_data)):
+            d['br_taken'] += 1
+
+    ld_st = 0
+    if (is_ld_op(op_data3)):
+        d['ld_cnt'] += 1
+        ld_st = 1
+    elif (is_st_op(op_data3)):
+        d['st_cnt'] += 1
+        ld_st = 1
+
+    if (ld_st == 1):
+        if (is_dc_miss(op_data3)):
+            d['dc_miss'] += 1
+            dc_miss_lat = get_dc_miss_lat(op_data3)
+            d['dc_miss_lat'].append(dc_miss_lat)
+            if (is_l2_miss(op_data3)):
+                d['l2_miss'] += 1
+                if (get_data_src(op_data2) > 1):
+                    d['l3_miss'] += 1
+        if (is_phy_addr_val(op_data3)):
+            if (is_l1_dtlb_miss(op_data3)):
+                d['l1_dtlb_miss'] += 1
+                dtlb_miss_lat = get_dtlb_miss_lat(op_data3)
+                d['dtlb_miss_lat'].append(dtlb_miss_lat)
+                if (is_l2_dtlb_miss(op_data3)):
+                    d['l2_dtlb_miss'] += 1
+
+def print_header():
+    addr_width = len(format(data[0]['addr'], 'x')) + 32
+    pattern = ("%-" + str(addr_width) + "s | %7s | %7s %7s %9s %7s %9s %7s %9s %7s"
+               " %7s | %7s %9s %7s %9s %7s %7s | %15s %9s")
+    print(pattern % ("", "Nr", "", "", "", "", "", "", "", "90th", "Avg", "L1Dtlb", "",
+                    "L2Dtlb", "", "90th", "Avg", "Branch", ""))
+    print(pattern % ("Disassembly", "Samples", "LdSt", "DcMiss", "(%)", "L2Miss", "(%)",
+                     "L3Miss", "(%)", "PctLat", "Lat", "Miss", "(%)", "Miss", "(%)",
+                     "PctLat", "Lat", "Miss/Retired", "(%)"))
+    print("--------------------------------------------------------------------------------------"
+          "--------------------------------------------------------------------------------------"
+          "------------------------------------------------")
+
+def print_footer():
+    print("Percentages: Cache miss and TLB miss %es are wrt NrLdSt not NrSamples")
+    print("--------------------------------------------------------------------------------------"
+          "--------------------------------------------------------------------------------------"
+          "------------------------------------------------")
+def trace_end():
+    global data
+
+    print_header()
+
+    for d in data:
+        dc_miss_perc = 0
+        l2_miss_perc = 0
+        l3_miss_perc = 0
+        l1_dtlb_miss_perc = 0
+        l2_dtlb_miss_perc = 0
+        avg_dc_miss_lat = 0
+        pct_dc_miss_lat = 0
+        avg_dtlb_miss_lat = 0
+        pct_dtlb_miss_lat = 0
+        if (d['ld_cnt'] or d['st_cnt']):
+            dc_miss_perc = (d['dc_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
+            l2_miss_perc = (d['l2_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
+            l3_miss_perc = (d['l3_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
+            l1_dtlb_miss_perc = (d['l1_dtlb_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
+            l2_dtlb_miss_perc = (d['l2_dtlb_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
+            if (d['dc_miss_lat']):
+                avg_dc_miss_lat = sum(d['dc_miss_lat']) / float(len(d['dc_miss_lat']))
+                pct_dc_miss_lat = np.percentile(d['dc_miss_lat'], 90)
+            if (d['dtlb_miss_lat']):
+                avg_dtlb_miss_lat = sum(d['dtlb_miss_lat']) / float(len(d['dtlb_miss_lat']))
+                pct_dtlb_miss_lat = np.percentile(d['dtlb_miss_lat'], 90)
+
+        br_miss_perc = 0
+        if (d['br_ret']):
+            br_miss_perc = (d['br_miss'] * 100) / float(d['br_ret'])
+
+        print("%x: %-30s | %7d | %7d %7d (%6.2f%%) %7d (%6.2f%%) %7d (%6.2f%%)"
+              " %7d %7d | %7d (%6.2f%%) %7d (%6.2f%%) %7d %7d | %7d/%-7d (%6.2f%%)" %
+              (d['addr'], d['insn'], d['nr_samples'], d['ld_cnt'] + d['st_cnt'],
+               d['dc_miss'], dc_miss_perc, d['l2_miss'], l2_miss_perc,
+               d['l3_miss'], l3_miss_perc, pct_dc_miss_lat, avg_dc_miss_lat,
+               d['l1_dtlb_miss'], l1_dtlb_miss_perc, d['l2_dtlb_miss'],
+               l2_dtlb_miss_perc, pct_dtlb_miss_lat, avg_dtlb_miss_lat,
+               d['br_miss'], d['br_ret'], br_miss_perc))
+
+    print_footer()
diff --git a/tools/perf/scripts/python/amd-ibs-op-metrics.py b/tools/perf/scripts/python/amd-ibs-op-metrics.py
new file mode 100644
index 000000000000..67c0b2f9d79a
--- /dev/null
+++ b/tools/perf/scripts/python/amd-ibs-op-metrics.py
@@ -0,0 +1,285 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2025 Advanced Micro Devices, Inc.
+#
+# Print various metric events at function granularity using AMD IBS Op PMU.
+
+from __future__ import print_function
+
+import os
+import sys
+import re
+import numpy as np
+from optparse import OptionParser, make_option
+
+# To avoid BrokenPipeError when redirecting output to head/less etc.
+from signal import signal, SIGPIPE, SIG_DFL
+signal(SIGPIPE,SIG_DFL)
+
+# IBS OP DATA bit positions
+IBS_OPDATA_BR_TAKEN_SHIFT       = 35
+IBS_OPDATA_BR_MISS_SHIFT        = 36
+IBS_OPDATA_BR_RET_SHIFT         = 37
+
+# IBS OP DATA2 bit positions
+IBS_OPDATA2_DATA_SRC_LOW_SHIFT  = 0
+IBS_OPDATA2_DATA_SRC_HIGH_SHIFT = 6
+
+# IBS OP DATA3 bit positions
+IBS_OPDATA3_LDOP_SHIFT          = 0
+IBS_OPDATA3_STOP_SHIFT          = 1
+IBS_OPDATA3_L1_DTLB_MISS_SHIFT  = 2
+IBS_OPDATA3_L2_DTLB_MISS_SHIFT  = 3
+IBS_OPDATA3_DC_MISS_SHIFT       = 7
+IBS_OPDATA3_L2_MISS_SHIFT       = 20
+IBS_OPDATA3_DC_MISS_LAT_SHIFT   = 32
+IBS_OPDATA3_PHYADDR_VAL_SHIFT   = 18
+IBS_OPDATA3_DTLB_MISS_LAT_SHIFT = 48
+
+allowed_sort_keys = ("nr_samples", "dc_miss", "l2_miss", "l3_miss", "l1_dtlb_miss", "l2_dtlb_miss", "br_miss")
+default_sort_order = ("nr_samples",) # Trailing comman is needed for single member tuple
+sort_order = default_sort_order
+options = None
+
+def parse_cmdline_options():
+    global sort_order
+    global options
+
+    option_list = [
+        make_option("-s", "--sort", dest="sort",
+                    help="Comma separated custom sort order. Allowed values: " +
+                         ", ".join(allowed_sort_keys))
+    ]
+
+    parser = OptionParser(option_list=option_list)
+    (options, args) = parser.parse_args()
+
+    if (options.sort):
+        sort_err = 0
+        temp = []
+        for sort_option in options.sort.split(","):
+            if sort_option not in allowed_sort_keys:
+                print("ERROR: Invalid sort option: %s" % sort_option)
+                print("       Falling back to default sort order.")
+                sort_err = 1
+                break
+            else:
+                temp.append(sort_option)
+
+        if (sort_err == 0):
+            sort_order = tuple(temp)
+
+parse_cmdline_options()
+
+# Final data
+data = {}
+
+def init_data_element(symbol, cpumode, dso):
+    # XXX: Should the key be dso:symbol ?
+    data[symbol] = {
+        'nr_samples': 0,
+        'cpumode': cpumode,
+
+        # Branch data
+        'br_ret': 0,
+        'br_miss': 0,
+        'br_taken': 0,
+        'br_fallth': 0,
+
+        # Load / Store data
+        'ld_cnt': 0, # LdOp=1 && StOp=1 are only added int ld_cnt
+        'st_cnt': 0,
+        'dc_miss': 0,
+        'l2_miss': 0,
+        'l3_miss': 0,
+        # XXX: Breakdown beyond L3 ?
+        'dc_miss_lat': [],
+
+        'l1_dtlb_miss': 0,
+        'l2_dtlb_miss': 0,
+        'dtlb_miss_lat': [],
+
+        # Misc data
+        'dso': dso,
+    }
+
+def get_cpumode(cpumode):
+    if (cpumode == 1):
+        return 'K'
+    if (cpumode == 2):
+        return 'U'
+    if (cpumode == 3):
+        return 'H'
+    if (cpumode == 4):
+        return 'GK'
+    if (cpumode == 5):
+        return 'GU'
+    return '?'
+
+def is_br_ret(op_data):
+    return (op_data >> IBS_OPDATA_BR_RET_SHIFT) & 0x1
+
+def is_br_miss(op_data):
+    return (op_data >> IBS_OPDATA_BR_MISS_SHIFT) & 0x1
+
+def is_br_taken(op_data):
+    return (op_data >> IBS_OPDATA_BR_TAKEN_SHIFT) & 0x1
+
+def is_ld_op(op_data3):
+    return (op_data3 >> IBS_OPDATA3_LDOP_SHIFT) & 0x1
+
+def is_st_op(op_data3):
+    return (op_data3 >> IBS_OPDATA3_STOP_SHIFT) & 0x1
+
+def is_dc_miss(op_data3):
+    return (op_data3 >> IBS_OPDATA3_DC_MISS_SHIFT) & 0x1
+
+def get_dc_miss_lat(op_data3):
+    return (op_data3 >> IBS_OPDATA3_DC_MISS_LAT_SHIFT) & 0xffff
+
+def is_l2_miss(op_data3):
+    return (op_data3 >> IBS_OPDATA3_L2_MISS_SHIFT) & 0x1
+
+def get_data_src(op_data2):
+    data_src_high = (op_data2 >> IBS_OPDATA2_DATA_SRC_HIGH_SHIFT) & 0x3
+    data_src_low = (op_data2 >> IBS_OPDATA2_DATA_SRC_LOW_SHIFT) & 0x7
+    return (data_src_high << 3) | data_src_low
+
+def is_phy_addr_val(op_data3):
+    return (op_data3 >> IBS_OPDATA3_PHYADDR_VAL_SHIFT) & 0x1
+
+def is_l1_dtlb_miss(op_data3):
+    return (op_data3 >> IBS_OPDATA3_L1_DTLB_MISS_SHIFT) & 0x1
+
+def get_dtlb_miss_lat(op_data3):
+    return (op_data3 >> IBS_OPDATA3_DTLB_MISS_LAT_SHIFT) & 0xffff
+
+def is_l2_dtlb_miss(op_data3):
+    return (op_data3 >> IBS_OPDATA3_L2_DTLB_MISS_SHIFT) & 0x1
+
+def process_event(param_dict):
+    raw_buf = param_dict['raw_buf']
+    op_data = int.from_bytes(raw_buf[20:28], "little")
+    op_data2 = int.from_bytes(raw_buf[28:36], "little")
+    op_data3 = int.from_bytes(raw_buf[36:44], "little")
+
+    if ('symbol' in param_dict):
+        symbol = param_dict['symbol']
+        symbol = re.sub(r'\(.*\)', '', symbol)
+    else:
+        symbol = hex(param_dict['sample']['ip'])
+
+    if (symbol not in data):
+        init_data_element(symbol, get_cpumode(param_dict['sample']['cpumode']),
+                          param_dict['dso'] if 'dso' in param_dict else "")
+
+    data[symbol]['nr_samples'] += 1
+
+    if (is_br_ret(op_data)):
+        data[symbol]['br_ret'] += 1
+        if (is_br_miss(op_data)):
+            data[symbol]['br_miss'] += 1
+        if (is_br_taken(op_data)):
+            data[symbol]['br_taken'] += 1
+
+    ld_st = 0
+    if (is_ld_op(op_data3)):
+        data[symbol]['ld_cnt'] += 1
+        ld_st = 1
+    elif (is_st_op(op_data3)):
+        data[symbol]['st_cnt'] += 1
+        ld_st = 1
+
+    if (ld_st == 1):
+        if (is_dc_miss(op_data3)):
+            data[symbol]['dc_miss'] += 1
+            dc_miss_lat = get_dc_miss_lat(op_data3)
+            data[symbol]['dc_miss_lat'].append(dc_miss_lat)
+            if (is_l2_miss(op_data3)):
+                data[symbol]['l2_miss'] += 1
+                if (get_data_src(op_data2) > 1):
+                    data[symbol]['l3_miss'] += 1
+        if (is_phy_addr_val(op_data3)):
+            if (is_l1_dtlb_miss(op_data3)):
+                data[symbol]['l1_dtlb_miss'] += 1
+                dtlb_miss_lat = get_dtlb_miss_lat(op_data3)
+                data[symbol]['dtlb_miss_lat'].append(dtlb_miss_lat)
+                if (is_l2_dtlb_miss(op_data3)):
+                    data[symbol]['l2_dtlb_miss'] += 1
+
+def print_sort_order():
+    global sort_order
+    print("Sort Order: " + ",".join(sort_order))
+
+def print_header():
+    print_sort_order()
+    print("Percentages: Cache miss and TLB miss %es are wrt NrLdSt not NrSamples")
+    print("%-45s| %7s | %7s %7s %9s %7s %9s %7s %9s %7s %7s | %7s %9s %7s %9s %7s %7s | %15s %9s | %s" %
+          ("","Nr", "Nr", "", "", "", "", "", "", "90th", "Avg", "L1Dtlb", "", "L2Dtlb", "", "90th",
+           "Avg", "Branch", "", ""))
+    print("%-45s| %7s | %7s %7s %9s %7s %9s %7s %9s %7s %7s | %7s %9s %7s %9s %7s %7s | %15s %9s | %s" %
+          ("function","Samples", "LdSt", "DcMiss", "(%)", "L2Miss", "(%)", "L3Miss", "(%)",
+           "PctLat", "Lat", "Miss", "(%)", "Miss", "(%)", "PctLat", "Lat", "Miss/Retired", "(%)", "dso"))
+    print("--------------------------------------------------------------------------------------"
+          "--------------------------------------------------------------------------------------"
+          "----------------------------------------------------------------")
+
+def print_footer():
+    print("--------------------------------------------------------------------------------------"
+          "--------------------------------------------------------------------------------------"
+          "----------------------------------------------------------------")
+    print()
+
+def sort_fun(item):
+    global sort_order
+
+    temp = []
+    for sort_option in sort_order:
+        temp.append(item[1][sort_option])
+    return tuple(temp)
+
+def trace_end():
+    sorted_data = sorted(data.items(), key = sort_fun, reverse = True)
+
+    print_header()
+
+    for d in sorted_data:
+        symbol_cpumode = d[0] + " [" + d[1]['cpumode'] + "]"
+
+        dc_miss_perc = 0
+        l2_miss_perc = 0
+        l3_miss_perc = 0
+        l1_dtlb_miss_perc = 0
+        l2_dtlb_miss_perc = 0
+        avg_dc_miss_lat = 0
+        pct_dc_miss_lat = 0
+        avg_dtlb_miss_lat = 0
+        pct_dtlb_miss_lat = 0
+        if (d[1]['ld_cnt'] or d[1]['st_cnt']):
+            dc_miss_perc = (d[1]['dc_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
+            l2_miss_perc = (d[1]['l2_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
+            l3_miss_perc = (d[1]['l3_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
+            l1_dtlb_miss_perc = (d[1]['l1_dtlb_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
+            l2_dtlb_miss_perc = (d[1]['l2_dtlb_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
+            if (d[1]['dc_miss_lat']):
+                avg_dc_miss_lat = sum(d[1]['dc_miss_lat']) / float(len(d[1]['dc_miss_lat']))
+                pct_dc_miss_lat = np.percentile(d[1]['dc_miss_lat'], 90)
+            if (d[1]['dtlb_miss_lat']):
+                avg_dtlb_miss_lat = sum(d[1]['dtlb_miss_lat']) / float(len(d[1]['dtlb_miss_lat']))
+                pct_dtlb_miss_lat = np.percentile(d[1]['dtlb_miss_lat'], 90)
+
+        br_miss_perc = 0
+        if (d[1]['br_ret']):
+            br_miss_perc = (d[1]['br_miss'] * 100) / float(d[1]['br_ret'])
+
+        print("%-45s| %7d | %7d %7d (%6.2f%%) %7d (%6.2f%%) %7d (%6.2f%%)"
+              " %7d %7d | %7d (%6.2f%%) %7d (%6.2f%%) %7d %7d | %7d/%-7d (%6.2f%%) | %s" %
+              (symbol_cpumode, d[1]['nr_samples'],
+              d[1]['ld_cnt'] + d[1]['st_cnt'], d[1]['dc_miss'], dc_miss_perc,
+              d[1]['l2_miss'], l2_miss_perc, d[1]['l3_miss'], l3_miss_perc,
+              pct_dc_miss_lat, avg_dc_miss_lat, d[1]['l1_dtlb_miss'],
+              l1_dtlb_miss_perc, d[1]['l2_dtlb_miss'], l2_dtlb_miss_perc,
+              pct_dtlb_miss_lat, avg_dtlb_miss_lat,
+              d[1]['br_miss'], d[1]['br_ret'], br_miss_perc, d[1]['dso']))
+
+    print_footer()
-- 
2.43.0