linux-kernel - Re: [RFC] perf script AMD/IBS: Add scripts to show function/instruction level granular profile

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fW4Vzhs7BOeAhom5csRUk+UkCFdc1H9HT4AhMdei8FRKQ@mail.gmail.com>
Date: Fri, 24 Jan 2025 08:16:10 -0800
From: Ian Rogers <irogers@...gle.com>
To: Ravi Bangoria <ravi.bangoria@....com>
Cc: acme@...nel.org, namhyung@...nel.org, peterz@...radead.org, 
	mingo@...hat.com, eranian@...gle.com, kan.liang@...ux.intel.com, 
	jolsa@...nel.org, adrian.hunter@...el.com, alexander.shishkin@...ux.intel.com, 
	bp@...en8.de, mark.rutland@....com, linux-kernel@...r.kernel.org, 
	linux-perf-users@...r.kernel.org, santosh.shukla@....com, 
	ananth.narayan@....com, sandipan.das@....com
Subject: Re: [RFC] perf script AMD/IBS: Add scripts to show
 function/instruction level granular profile

On Thu, Jan 23, 2025 at 10:07 PM Ravi Bangoria <ravi.bangoria@....com> wrote:
>
> AMD IBS (Instruction Based Sampling) PMUs provides various insights
> about instruction execution through front-end and back-end units.
> Various perf tools (e.g. precise-mode (:p), perf-mem, perf-c2c etc.)
> uses portion of these information but lot of other insightful data are
> still remains unused by perf. I could not think of any generic perf
> tool where I can consolidate and show all these data, so thought to
> add perf-python scripts.
>
> 1) amd-ibs-op-metrics.py: Print various back-end metric events at
>    function granularity using AMD IBS Op PMU.
> 2) amd-ibs-op-metrics-annotate.py: Print various back-end metric events
>    at instruction granularity using AMD IBS Op PMU.
> 3) amd-ibs-fetch-metrics.py: Print various front-end metric events at
>    function granularity using AMD IBS Fetch PMU.
>    (Annotate script can be added for Fetch PMU as well).
>
> This is still early prototype and thus lot of rough edges. Please feel
> free to report bugs/enhancements if you find these to be useful.
>
> Example usage:
>
> IBS Op:
>
>   # perf record -a -e ibs_op// -c 1000000 --raw-sample -- make
>   [ perf record: Woken up 91 times to write data ]
>   [ perf record: Captured and wrote 49.926 MB perf.data (386979 samples) ]
>
>   # perf script -s amd-ibs-op-metrics.py -- --sort=dc_miss,l2_miss | head -15
>   Sort Order: dc_miss,l2_miss
>   Percentages: Cache miss and TLB miss %es are wrt NrLdSt not NrSamples
>                                                |      Nr |      Nr                                                          90th     Avg |  L1Dtlb            L2Dtlb              90th     Avg |          Branch           |
>   function                                     | Samples |    LdSt  DcMiss       (%)  L2Miss       (%)  L3Miss       (%)  PctLat     Lat |    Miss       (%)    Miss       (%)  PctLat     Lat |    Miss/Retired       (%) | dso
>   --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   clear_page_erms [K]                          |    6704 |    6059    4767 ( 78.68%)    4085 ( 67.42%)    4027 ( 66.46%)       0       0 |      13 (  0.21%)       4 (  0.07%)      76      80 |       0/5       (  0.00%) | [kernel.kallsyms]
>   __memmove_avx512_unaligned_erms [U]          |    6274 |    2461    1298 ( 52.74%)    1099 ( 44.66%)     725 ( 29.46%)     465     265 |     996 ( 40.47%)     668 ( 27.14%)     137      88 |      53/2032    (  2.61%) | /usr/lib/x86_64-linux-gnu/libc.so.6
>   __memset_avx512_unaligned_erms [U]           |    2759 |    1343     664 ( 49.44%)     345 ( 25.69%)     143 ( 10.65%)       0       0 |     122 (  9.08%)      20 (  1.49%)      94      44 |      20/317     (  6.31%) | /usr/lib/x86_64-linux-gnu/libc.so.6
>   _copy_to_iter [K]                            |     918 |     640     351 ( 54.84%)     231 ( 36.09%)     163 ( 25.47%)    1341     391 |      13 (  2.03%)       5 (  0.78%)    1567     369 |       0/3       (  0.00%) | [kernel.kallsyms]
>   pop_scope [U]                                |    1648 |     960     302 ( 31.46%)     258 ( 26.88%)     224 ( 23.33%)    1515     493 |      59 (  6.15%)      15 (  1.56%)     782     205 |       6/534     (  1.12%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
>   memset [K]                                   |     776 |     505     185 ( 36.63%)      61 ( 12.08%)      46 (  9.11%)       0       0 |       3 (  0.59%)       2 (  0.40%)    4985    2200 |       0/9       (  0.00%) | [kernel.kallsyms]
>   _int_malloc [U]                              |    4534 |    1523     178 ( 11.69%)      43 (  2.82%)       6 (  0.39%)      40      25 |      88 (  5.78%)      12 (  0.79%)      84      42 |     103/1141    (  9.03%) | /usr/lib/x86_64-linux-gnu/libc.so.6
>   ggc_internal_alloc [U]                       |    2891 |    1254     138 ( 11.00%)      78 (  6.22%)      45 (  3.59%)     905     267 |      80 (  6.38%)       1 (  0.08%)      10      17 |      16/448     (  3.57%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
>   native_queued_spin_lock_slowpath [K]         |   36544 |   17736     125 (  0.70%)     124 (  0.70%)     115 (  0.65%)     695     390 |       0 (  0.00%)       0 (  0.00%)       0       0 |      18/17327   (  0.10%) | [kernel.kallsyms]
>   get_mem_cgroup_from_mm [K]                   |     985 |     341     122 ( 35.78%)       9 (  2.64%)       1 (  0.29%)      23      19 |      74 ( 21.70%)       0 (  0.00%)       7       7 |       0/297     (  0.00%) | [kernel.kallsyms]
>
>   o Default sort order is Nr Samples.
>   o Cache misses and TLB misses percentages are wrt Nr LdSt. Branch
>     miss percentages are wrt branches retired.
>   o Use --help for more detail.
>
> IBS Op Annotate:
>
>   # perf script -s amd-ibs-op-metrics-annotate.py -- --dso=/home/ravi/linux/vmlinux --symbol=clear_page_erms
>                                                    |      Nr |                                                                  90th     Avg |  L1Dtlb            L2Dtlb              90th     Avg |          Branch
>   Disassembly                                      | Samples |    LdSt  DcMiss       (%)  L2Miss       (%)  L3Miss       (%)  PctLat     Lat |    Miss       (%)    Miss       (%)  PctLat     Lat |    Miss/Retired       (%)
>   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   ffffffff821d3e10: mov    $0x1000,%ecx            |       6 |       0       0 (  0.00%)       0 (  0.00%)       0 (  0.00%)       0       0 |       0 (  0.00%)       0 (  0.00%)       0       0 |       0/0       (  0.00%)
>   ffffffff821d3e15: xor    %eax,%eax               |       4 |       0       0 (  0.00%)       0 (  0.00%)       0 (  0.00%)       0       0 |       0 (  0.00%)       0 (  0.00%)       0       0 |       0/0       (  0.00%)
>   ffffffff821d3e17: rep stos %al,%es:(%rdi)        |    6687 |    6059    4767 ( 78.68%)    4085 ( 67.42%)    4027 ( 66.46%)       0       0 |      13 (  0.21%)       4 (  0.07%)      76      80 |       0/0       (  0.00%)
>   ffffffff821d3e19: jmp    ffffffff821f27a0        |       7 |       0       0 (  0.00%)       0 (  0.00%)       0 (  0.00%)       0       0 |       0 (  0.00%)       0 (  0.00%)       0       0 |       0/5       (  0.00%)
>   Percentages: Cache miss and TLB miss %es are wrt NrLdSt not NrSamples
>   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>   o Actual disassembly of the function, so data are not sorted.
>   o Cache misses and TLB misses percentages are wrt Nr LdSt. Branch
>     miss percentages are wrt branches retired.
>
> IBS Fetch:
>
>   # perf record -a -e ibs_fetch// -c 1000000 --raw-sample -- make
>   [ perf record: Woken up 4 times to write data ]
>   [ perf record: Captured and wrote 15.051 MB perf.data (112595 samples) ]
>
>   # perf script -s amd-ibs-fetch-metrics.py -- --sort=ic_miss | head -15
>   Sort Order: ic_miss
>                                                |      Nr |                                                                            90th     Avg |   Fetch           |  L1Itlb            L2Itlb           |
>   function                                     | Samples |  OcMiss       (%)  IcMiss       (%)  L2Miss       (%)  L3Miss       (%)  PctLat     Lat |   Abort       (%) |    Miss       (%)    Miss       (%) | dso
>   ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   _int_malloc [U]                              |    1379 |     407 ( 29.51%)     130 (  9.43%)       1 (  0.07%)       0 (  0.00%)      20      14 |       0 (  0.00%) |      11 (  0.80%)       5 (  0.36%) | /usr/lib/x86_64-linux-gnu/libc.so.6
>   _cpp_lex_direct [U]                          |    1621 |     133 (  8.20%)      35 (  2.16%)       1 (  0.06%)       0 (  0.00%)      26      16 |       0 (  0.00%) |       1 (  0.06%)       1 (  0.06%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
>   mas_walk [K]                                 |     115 |      75 ( 65.22%)      33 ( 28.70%)       0 (  0.00%)       0 (  0.00%)      20      14 |       0 (  0.00%) |       0 (  0.00%)       0 (  0.00%) | [kernel.kallsyms]
>   _int_free [U]                                |     598 |      83 ( 13.88%)      32 (  5.35%)       0 (  0.00%)       0 (  0.00%)      17      13 |       0 (  0.00%) |       5 (  0.84%)       3 (  0.50%) | /usr/lib/x86_64-linux-gnu/libc.so.6
>   __libc_calloc [U]                            |     202 |      72 ( 35.64%)      31 ( 15.35%)       0 (  0.00%)       0 (  0.00%)      24      27 |       0 (  0.00%) |      10 (  4.95%)       6 (  2.97%) | /usr/lib/x86_64-linux-gnu/libc.so.6
>   ggc_internal_alloc [U]                       |     516 |     102 ( 19.77%)      29 (  5.62%)       0 (  0.00%)       0 (  0.00%)      19      14 |       0 (  0.00%) |       6 (  1.16%)       4 (  0.78%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
>   _int_free_merge_chunk [U]                    |     219 |      58 ( 26.48%)      29 ( 13.24%)       0 (  0.00%)       0 (  0.00%)      18      14 |       0 (  0.00%) |       4 (  1.83%)       0 (  0.00%) | /usr/lib/x86_64-linux-gnu/libc.so.6
>   get_page_from_freelist [K]                   |      68 |      45 ( 66.18%)      28 ( 41.18%)       1 (  1.47%)       0 (  0.00%)      27      23 |       0 (  0.00%) |       0 (  0.00%)       0 (  0.00%) | [kernel.kallsyms]
>   __handle_mm_fault [K]                        |      70 |      43 ( 61.43%)      26 ( 37.14%)       2 (  2.86%)       0 (  0.00%)      17      15 |       0 (  0.00%) |       0 (  0.00%)       0 (  0.00%) | [kernel.kallsyms]
>   operand_compare::operand_equal_p [U]         |     364 |      82 ( 22.53%)      26 (  7.14%)       1 (  0.27%)       0 (  0.00%)      18      14 |       0 (  0.00%) |       8 (  2.20%)       6 (  1.65%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
>   bitmap_set_bit [U]                           |    1917 |      81 (  4.23%)      25 (  1.30%)       0 (  0.00%)       0 (  0.00%)      23      15 |       0 (  0.00%) |      10 (  0.52%)       8 (  0.42%) | /usr/libexec/gcc/x86_64-linux-gnu/13/cc1
>
>   o Default sort order is Nr Samples.
>   o All percentages are wrt Nr Samples.
>   o Use --help for more detail.

Really nice!

> Signed-off-by: Ravi Bangoria <ravi.bangoria@....com>
> ---
>  .../scripts/python/amd-ibs-fetch-metrics.py   | 219 +++++++++++
>  .../python/amd-ibs-op-metrics-annotate.py     | 342 ++++++++++++++++++
>  .../perf/scripts/python/amd-ibs-op-metrics.py | 285 +++++++++++++++
>  3 files changed, 846 insertions(+)
>  create mode 100644 tools/perf/scripts/python/amd-ibs-fetch-metrics.py
>  create mode 100644 tools/perf/scripts/python/amd-ibs-op-metrics-annotate.py
>  create mode 100644 tools/perf/scripts/python/amd-ibs-op-metrics.py
>
> diff --git a/tools/perf/scripts/python/amd-ibs-fetch-metrics.py b/tools/perf/scripts/python/amd-ibs-fetch-metrics.py
> new file mode 100644
> index 000000000000..63a91843585f
> --- /dev/null
> +++ b/tools/perf/scripts/python/amd-ibs-fetch-metrics.py
> @@ -0,0 +1,219 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2025 Advanced Micro Devices, Inc.
> +#
> +# Print various metric events at function granularity using AMD IBS Fetch PMU.
> +
> +from __future__ import print_function

I think at some future point we should go through the perf python code
and strip out python2-isms like this. There's no need to add more as
python2 doesn't exist any more.

> +
> +import os
> +import sys

Quick check and these imports didn't appear used.

> +import re
> +import numpy as np
> +from optparse import OptionParser, make_option
> +
> +# To avoid BrokenPipeError when redirecting output to head/less etc.
> +from signal import signal, SIGPIPE, SIG_DFL
> +signal(SIGPIPE,SIG_DFL)
> +
> +# IBS FETCH CTL bit positions
> +IBS_FETCH_CTL_FETCH_LAT_SHIFT       = 32
> +IBS_FETCH_CTL_IC_MISS_SHIFT         = 51
> +IBS_FETCH_CTL_L1_ITLB_MISS_SHIFT    = 55
> +IBS_FETCH_CTL_L2_ITLB_MISS_SHIFT    = 56
> +IBS_FETCH_CTL_L2_MISS_SHIFT         = 58
> +IBS_FETCH_CTL_OC_MISS_SHIFT         = 60
> +IBS_FETCH_CTL_L3_MISS_SHIFT         = 61
> +IBS_FETCH_CTL_FETCH_COMP            = 50
> +
> +allowed_sort_keys = ("nr_samples", "oc_miss", "ic_miss", "l2_miss", "l3_miss", "abort", "l1_itlb_miss", "l2_itlb_miss")
> +default_sort_order = ("nr_samples",) # Trailing comman is needed for single member tuple

Given these are lists of strings, I'm not sure why you're trying to use tuples?

> +sort_order = default_sort_order
> +options = None
> +
> +def parse_cmdline_options():
> +    global sort_order
> +    global options
> +
> +    option_list = [
> +        make_option("-s", "--sort", dest="sort",
> +                    help="Comma separated custom sort order. Allowed values: " +
> +                         ", ".join(allowed_sort_keys))
> +    ]
> +
> +    parser = OptionParser(option_list=option_list)
> +    (options, args) = parser.parse_args()
> +
> +    if (options.sort):
> +        sort_err = 0
> +        temp = []
> +        for sort_option in options.sort.split(","):
> +            if sort_option not in allowed_sort_keys:
> +                print("ERROR: Invalid sort option: %s" % sort_option)
> +                print("       Falling back to default sort order.")
> +                sort_err = 1
> +                break
> +            else:
> +                temp.append(sort_option)
> +
> +        if (sort_err == 0):
> +            sort_order = tuple(temp)
> +
> +parse_cmdline_options()
> +
> +data = {};
> +
> +def init_data_element(symbol, cpumode, dso):

Consider types and using mypy? Fwiw, I sent this (reviewed but not merged):
https://lore.kernel.org/lkml/20241025172303.77538-1-irogers@google.com/
which adds build support for mypy and pylint, although not enabled by
default given the number of errors.

> +    # XXX: Should the key be dso:symbol ?
> +    data[symbol] = {
> +        'nr_samples': 0,
> +        'cpumode': cpumode,
> +
> +        'oc_miss': 0,
> +        'ic_miss': 0,
> +        'l2_miss': 0,
> +        'l3_miss': 0,
> +        'lat': [],
> +
> +        'abort': 0,
> +
> +        'l1_itlb_miss': 0,
> +        'l2_itlb_miss': 0,
> +
> +        # Misc data
> +        'dso': dso,
> +    }
> +
> +def get_cpumode(cpumode):
> +    if (cpumode == 1):
> +        return 'K'
> +    if (cpumode == 2):
> +        return 'U'
> +    if (cpumode == 3):
> +        return 'H'
> +    if (cpumode == 4):
> +        return 'GK'
> +    if (cpumode == 5):
> +        return 'GU'
> +    return '?'

Perhaps use a dictionary? Something like:
```
def get_cpumode(cpumode: int)- > str:
    modes = {
         1: 'K',
         2: 'U',
         3: 'H',
         4: 'GK',
         5: 'GU',
     }
     return modes[cpumode] if cpumode in modes else '?'
```

> +
> +def is_oc_miss(fetch_ctl):
> +    return (fetch_ctl >> IBS_FETCH_CTL_OC_MISS_SHIFT) & 0x1
> +
> +def is_ic_miss(fetch_ctl):
> +    return (fetch_ctl >> IBS_FETCH_CTL_IC_MISS_SHIFT) & 0x1
> +
> +def is_l2_miss(fetch_ctl):
> +    return ((fetch_ctl >> IBS_FETCH_CTL_L2_MISS_SHIFT) & 0x1 and
> +            (fetch_ctl >> IBS_FETCH_CTL_FETCH_COMP) & 0x1)
> +
> +def is_l3_miss(fetch_ctl):
> +    return (fetch_ctl >> IBS_FETCH_CTL_L3_MISS_SHIFT) & 0x1
> +
> +def get_fetch_lat(fetch_ctl):
> +    return (fetch_ctl >> IBS_FETCH_CTL_FETCH_LAT_SHIFT) & 0xffff
> +
> +def is_l1_itlb_miss(fetch_ctl):
> +    return (fetch_ctl >> IBS_FETCH_CTL_L1_ITLB_MISS_SHIFT) & 0x1
> +
> +def is_l2_itlb_miss(fetch_ctl):
> +    return (fetch_ctl >> IBS_FETCH_CTL_L2_ITLB_MISS_SHIFT) & 0x1
> +
> +def is_comp(fetch_ctl):
> +    return (fetch_ctl >> IBS_FETCH_CTL_FETCH_COMP) & 0x1
> +
> +def process_event(param_dict):
> +    raw_buf = param_dict['raw_buf']
> +    fetch_ctl = int.from_bytes(raw_buf[4:12], "little")
> +
> +    if ('symbol' in param_dict):
> +        symbol = param_dict['symbol']
> +        symbol = re.sub(r'\(.*\)', '', symbol)
> +    else:
> +        symbol = hex(param_dict['sample']['ip'])
> +
> +    if (symbol not in data):
> +        init_data_element(symbol, get_cpumode(param_dict['sample']['cpumode']),
> +                          param_dict['dso'] if 'dso' in param_dict else "")
> +
> +    data[symbol]['nr_samples'] += 1
> +
> +    if (is_oc_miss(fetch_ctl)):
> +        data[symbol]['oc_miss'] += 1
> +        if (is_ic_miss(fetch_ctl)):
> +            data[symbol]['ic_miss'] += 1
> +            latency = get_fetch_lat(fetch_ctl)
> +            data[symbol]['lat'].append(latency)
> +            if (is_l2_miss(fetch_ctl)):
> +                data[symbol]['l2_miss'] += 1
> +                if (is_l3_miss(fetch_ctl)):
> +                    data[symbol]['l3_miss'] += 1
> +
> +    if (is_l1_itlb_miss(fetch_ctl)):
> +        data[symbol]['l1_itlb_miss'] += 1
> +        if (is_l2_itlb_miss(fetch_ctl)):
> +            data[symbol]['l2_itlb_miss'] += 1
> +
> +    if (is_comp(fetch_ctl) == 0):
> +        data[symbol]['abort'] += 1
> +
> +def print_sort_order():
> +    global sort_order
> +    print("Sort Order: " + ",".join(sort_order))
> +
> +def print_header():
> +    print_sort_order()
> +    print("%-45s| %7s | %7s %9s %7s %9s %7s %9s %7s %9s %7s %7s | %7s %9s | %7s %9s %7s %9s | %s" %
> +          ("","Nr", "", "", "", "", "", "", "", "", "90th", "Avg", "Fetch", "", "L1Itlb", "", "L2Itlb", "", ""))
> +    print("%-45s| %7s | %7s %9s %7s %9s %7s %9s %7s %9s %7s %7s | %7s %9s | %7s %9s %7s %9s | %s" %
> +          ("function", "Samples", "OcMiss", "(%)", "IcMiss", "(%)", "L2Miss", "(%)",
> +           "L3Miss", "(%)", "PctLat", "Lat", "Abort", "(%)", "Miss", "(%)", "Miss", "(%)", "dso"))

I believe the more pythonic way these days is to use f-strings:
```
print(f"{'':-45s}| {'Nr':7s} | {'':7s} {'':9s} {'':7s} {'':9s} {'':7s}
{'':9s} {'':7s} {'':9s} {'90th':7s} {'Avg':7s} | {'Fetch':7s} {'':9s}
| {'L1Itlb':7s} {'':9s} {'L2Itlb':7s} {'':9s} |")
print(f"{'function':-45s}| {'Samples':7s} | {'OcMiss':7s} {'(%)':9s}
{'IcMiss':7s} {'(%)':9s} {'L2Miss':7s} {'(%)':9s} {'L3Miss':7s}
{'(%)':9s} {'PctLat':7s} {'Lat':7s} | {'Abort':7s} {'(%)':9s} |
{'Miss':7s} {'(%)':9s} {'Miss':7s} {'(%)':9s} | {'dso':s}")
```
but this all feels a bit error prone. Perhaps add a helper function
with named arguments and let that call print.

> +    print("-----------------------------------------------------------------------------"
> +          "-----------------------------------------------------------------------------"
> +          "------------------------------------------------------------------")
> +
> +def print_footer():
> +    print("-----------------------------------------------------------------------------"
> +          "-----------------------------------------------------------------------------"
> +          "------------------------------------------------------------------")
> +    print()
> +
> +def sort_fun(item):
> +    global sort_order
> +
> +    temp = []
> +    for sort_option in sort_order:
> +        temp.append(item[1][sort_option])
> +    return tuple(temp)
> +
> +def trace_end():
> +    sorted_data = sorted(data.items(), key = sort_fun, reverse = True)
> +
> +    print_header()
> +
> +    for d in sorted_data:
> +        symbol_cpumode = d[0] + " [" + d[1]['cpumode'] + "]"
> +
> +        oc_miss_perc = (d[1]['oc_miss'] * 100) / float(d[1]['nr_samples'])
> +        ic_miss_perc = (d[1]['ic_miss'] * 100) / float(d[1]['nr_samples'])
> +        l2_miss_perc = (d[1]['l2_miss'] * 100) / float(d[1]['nr_samples'])
> +        l3_miss_perc = (d[1]['l3_miss'] * 100) / float(d[1]['nr_samples'])
> +        abort_perc = (d[1]['abort'] * 100) / float(d[1]['nr_samples'])
> +        l1_itlb_miss_perc = (d[1]['l1_itlb_miss'] * 100) / float(d[1]['nr_samples'])
> +        l2_itlb_miss_perc = (d[1]['l2_itlb_miss'] * 100) / float(d[1]['nr_samples'])
> +
> +        avg_lat = 0
> +        pct_lat = 0
> +        if (d[1]['lat']):
> +            avg_lat = sum(d[1]['lat']) / float(len(d[1]['lat']))
> +            pct_lat = np.percentile(d[1]['lat'], 90)
> +
> +        print("%-45s| %7d | %7d (%6.2f%%) %7d (%6.2f%%) %7d (%6.2f%%) %7d (%6.2f%%)"
> +              " %7d %7d | %7d (%6.2f%%) | %7d (%6.2f%%) %7d (%6.2f%%) | %s" %
> +              (symbol_cpumode, d[1]['nr_samples'], d[1]['oc_miss'], oc_miss_perc,
> +               d[1]['ic_miss'], ic_miss_perc, d[1]['l2_miss'], l2_miss_perc,
> +               d[1]['l3_miss'], l3_miss_perc, pct_lat, avg_lat, d[1]['abort'],
> +               abort_perc, d[1]['l1_itlb_miss'], l1_itlb_miss_perc,
> +               d[1]['l2_itlb_miss'], l2_itlb_miss_perc, d[1]['dso']))

Fwiw, I'm letting gemini convert these to f-strings. If I trust AI this becomes:
```
print(f"{symbol_cpumode:<45s}| {d[1]['nr_samples']:7d} |
{d[1]['oc_miss']:7d} ({oc_miss_perc:6.2f}%) {d[1]['ic_miss']:7d}
({ic_miss_perc:6.2f}%) {d[1]['l2_miss']:7d} ({l2_miss_perc:6.2f}%)
{d[1]['l3_miss']:7d} ({l3_miss_perc:6.2f}%) {pct_lat:7d} {avg_lat:7d}
| {d[1]['abort']:7d} ({abort_perc:6.2f}%) | {d[1]['l1_itlb_miss']:7d}
({l1_itlb_miss_perc:6.2f}%) {d[1]['l2_itlb_miss']:7d}
({l2_itlb_miss_perc:6.2f}%) | {d[1]['dso']:s}")
```
But given that keeping all these prints in sync is error prone, I
think a helper function is the way to go.

> +
> +    print_footer()
> diff --git a/tools/perf/scripts/python/amd-ibs-op-metrics-annotate.py b/tools/perf/scripts/python/amd-ibs-op-metrics-annotate.py
> new file mode 100644
> index 000000000000..beef6a302258
> --- /dev/null
> +++ b/tools/perf/scripts/python/amd-ibs-op-metrics-annotate.py
> @@ -0,0 +1,342 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2025 Advanced Micro Devices, Inc.
> +#
> +# Print various metric events at instruction granularity using AMD IBS Op PMU.
> +
> +from __future__ import print_function

Feedback here generally matches that above.

> +import os
> +import sys
> +import re
> +import numpy as np
> +from optparse import OptionParser, make_option
> +import subprocess
> +
> +# To avoid BrokenPipeError when redirecting output to head/less etc.
> +from signal import signal, SIGPIPE, SIG_DFL
> +signal(SIGPIPE,SIG_DFL)
> +
> +# IBS OP DATA bit positions
> +IBS_OPDATA_BR_TAKEN_SHIFT       = 35
> +IBS_OPDATA_BR_MISS_SHIFT        = 36
> +IBS_OPDATA_BR_RET_SHIFT         = 37
> +
> +# IBS OP DATA2 bit positions
> +IBS_OPDATA2_DATA_SRC_LOW_SHIFT  = 0
> +IBS_OPDATA2_DATA_SRC_HIGH_SHIFT = 6
> +
> +# IBS OP DATA3 bit positions
> +IBS_OPDATA3_LDOP_SHIFT          = 0
> +IBS_OPDATA3_STOP_SHIFT          = 1
> +IBS_OPDATA3_L1_DTLB_MISS_SHIFT  = 2
> +IBS_OPDATA3_L2_DTLB_MISS_SHIFT  = 3
> +IBS_OPDATA3_DC_MISS_SHIFT       = 7
> +IBS_OPDATA3_L2_MISS_SHIFT       = 20
> +IBS_OPDATA3_DC_MISS_LAT_SHIFT   = 32
> +IBS_OPDATA3_PHYADDR_VAL_SHIFT   = 18
> +IBS_OPDATA3_DTLB_MISS_LAT_SHIFT = 48
> +
> +INSN_SIZE_INVAL = -1
> +
> +annotate_symbol = None
> +annodate_dso = None

annotate_dso?

> +
> +#total_samples = 0
> +data = []
> +
> +def parse_cmdline_options():
> +    global annotate_symbol
> +    global annodate_dso
> +    global sort_order
> +    global options
> +
> +    option_list = [
> +        make_option("-d", "--dso", dest="dso",
> +                    help="Path of binary or a library the symbol belongs to"),
> +        make_option("-s", "--symbol", dest="symbol",
> +                    help="Symbol name")
> +    ]
> +
> +    parser = OptionParser(option_list=option_list)
> +    (options, args) = parser.parse_args()
> +
> +    if (options.dso):
> +        annodate_dso = options.dso
> +    else:
> +        print("Error: Invalid dso path.\n")
> +        exit()
> +
> +    if (options.symbol):
> +        annotate_symbol = options.symbol
> +    else:
> +        print("Error: Invalid symbol.\n")
> +        exit()
> +
> +def disassemble_symbol(symbol, dso):
> +    global data
> +
> +    readelf = subprocess.Popen(["readelf", "-WsC", "--sym-base=16", dso],
> +                               stdout=subprocess.PIPE, text=True)
> +    grep = subprocess.Popen(["grep", "-w", symbol], stdin=readelf.stdout,
> +                            stdout=subprocess.PIPE, text=True)
> +    output, error = grep.communicate()

Perhaps the pyelftools would be better here?
https://eli.thegreenplace.net/2012/01/06/pyelftools-python-library-for-parsing-elf-and-dwarf

> +
> +    if (error != None):
> +        print("Error reading symbol table data for '%s'" % (symbol))
> +        exit()
> +
> +    match = re.search(r'([^\s]+):\s([^\s]+)\s([^\s]+)\s([^\s]+)\s+([^\s]+)\s([^\s]+)\s+([^\s]+)\s([^\s]+)', output)
> +    if (match == None):
> +        print("Can not find start address / size of '%s'" % (symbol))
> +        exit()
> +
> +    start_addr = int(match.group(2), 16)
> +    size = int(match.group(3), 16)
> +    stop_addr = start_addr + size
> +
> +    objdump = subprocess.run(["objdump", "-d", "-C", "--no-show-raw-insn",
> +                              "--start-address", hex(start_addr), "--stop-address",
> +                              hex(stop_addr), dso], capture_output = True, text = True)
> +    if (objdump.returncode == 1):
> +        print("Error dissassembling '%s'" % (symbol))
> +        exit()
> +
> +    disasm = objdump.stdout.split("\n")
> +
> +    header_lines = 1
> +    # hex(<number>) will convert <number> to hex with 0x prefix. But objdump
> +    # addresses skips 0x, so use alternative format(<number>, 'x') which
> +    # converts <number> to hex without 0x prefix.
> +    start_addr_regex = r"^\s*" + format(start_addr, 'x') + r":"
> +    idx = 0;
> +    for line in disasm:
> +        if (header_lines and (not re.match(start_addr_regex, line))):
> +            continue
> +        header_lines = 0
> +
> +        match = re.search(r'\s*([^:]+):[\t\s]+(.*)', line)
> +        if (match == None):
> +            continue
> +
> +        addr = int(match.group(1), 16)
> +        offset = addr - start_addr
> +        insn = re.sub(r'(<.*>)|(\s+#.*)|(\s+$)', '', match.group(2))
> +
> +        data.append({
> +            'addr': addr,
> +            'insn_size': INSN_SIZE_INVAL,
> +            'symoff': offset,
> +            'insn': insn,
> +
> +            'nr_samples': 0,
> +
> +            # Branch data
> +            'br_ret': 0,
> +            'br_miss': 0,
> +            'br_taken': 0,
> +            'br_fallth': 0,
> +
> +            # Load / Store data
> +            'ld_cnt': 0, # LdOp=1 && StOp=1 are only added int ld_cnt
> +            'st_cnt': 0,
> +            'dc_miss': 0,
> +            'l2_miss': 0,
> +            'l3_miss': 0,
> +            # XXX: Breakdown beyond L3 ?
> +            'dc_miss_lat': [],
> +
> +            'l1_dtlb_miss': 0,
> +            'l2_dtlb_miss': 0,
> +            'dtlb_miss_lat': [],
> +        })
> +
> +        if (idx > 0):
> +            data[idx - 1]['insn_size'] = (data[idx]['addr'] -
> +                                          data[idx - 1]['addr']);
> +        idx += 1
> +
> +parse_cmdline_options()
> +disassemble_symbol(annotate_symbol, annodate_dso)
> +
> +def get_cpumode(cpumode):
> +    if (cpumode == 1):
> +        return 'K'
> +    if (cpumode == 2):
> +        return 'U'
> +    if (cpumode == 3):
> +        return 'H'
> +    if (cpumode == 4):
> +        return 'GK'
> +    if (cpumode == 5):
> +        return 'GU'
> +    return '?'
> +
> +def is_br_ret(op_data):
> +    return (op_data >> IBS_OPDATA_BR_RET_SHIFT) & 0x1
> +
> +def is_br_miss(op_data):
> +    return (op_data >> IBS_OPDATA_BR_MISS_SHIFT) & 0x1
> +
> +def is_br_taken(op_data):
> +    return (op_data >> IBS_OPDATA_BR_TAKEN_SHIFT) & 0x1
> +
> +def is_ld_op(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_LDOP_SHIFT) & 0x1
> +
> +def is_st_op(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_STOP_SHIFT) & 0x1
> +
> +def is_dc_miss(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_DC_MISS_SHIFT) & 0x1
> +
> +def get_dc_miss_lat(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_DC_MISS_LAT_SHIFT) & 0xffff
> +
> +def is_l2_miss(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_L2_MISS_SHIFT) & 0x1
> +
> +def get_data_src(op_data2):
> +    data_src_high = (op_data2 >> IBS_OPDATA2_DATA_SRC_HIGH_SHIFT) & 0x3
> +    data_src_low = (op_data2 >> IBS_OPDATA2_DATA_SRC_LOW_SHIFT) & 0x7
> +    return (data_src_high << 3) | data_src_low
> +
> +def is_phy_addr_val(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_PHYADDR_VAL_SHIFT) & 0x1
> +
> +def is_l1_dtlb_miss(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_L1_DTLB_MISS_SHIFT) & 0x1
> +
> +def get_dtlb_miss_lat(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_DTLB_MISS_LAT_SHIFT) & 0xffff
> +
> +def is_l2_dtlb_miss(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_L2_DTLB_MISS_SHIFT) & 0x1
> +
> +def process_event(param_dict):
> +    global data
> +
> +    raw_buf = param_dict['raw_buf']
> +    op_data = int.from_bytes(raw_buf[20:28], "little")
> +    op_data2 = int.from_bytes(raw_buf[28:36], "little")
> +    op_data3 = int.from_bytes(raw_buf[36:44], "little")
> +
> +    if ('symbol' not in param_dict):
> +        return
> +
> +    symbol = param_dict['symbol']
> +    symbol = re.sub(r'\(.*\)', '', symbol)
> +
> +    if (symbol != annotate_symbol):
> +        return
> +
> +    symoff = 0
> +    if ('symoff' in param_dict):
> +        symoff = param_dict['symoff']
> +
> +    idx = 0
> +    for d in data:
> +        if (d['symoff'] <= symoff and
> +            (d['insn_size'] == INSN_SIZE_INVAL or
> +             d['symoff'] + d['insn_size'] > symoff)):
> +            break
> +        else:
> +            idx += 1
> +
> +    d = data[idx]
> +
> +    d['nr_samples'] += 1
> +    #total_samples += 1
> +
> +    if (is_br_ret(op_data)):
> +        d['br_ret'] += 1
> +        if (is_br_miss(op_data)):
> +            d['br_miss'] += 1
> +        if (is_br_taken(op_data)):
> +            d['br_taken'] += 1
> +
> +    ld_st = 0
> +    if (is_ld_op(op_data3)):
> +        d['ld_cnt'] += 1
> +        ld_st = 1
> +    elif (is_st_op(op_data3)):
> +        d['st_cnt'] += 1
> +        ld_st = 1
> +
> +    if (ld_st == 1):
> +        if (is_dc_miss(op_data3)):
> +            d['dc_miss'] += 1
> +            dc_miss_lat = get_dc_miss_lat(op_data3)
> +            d['dc_miss_lat'].append(dc_miss_lat)
> +            if (is_l2_miss(op_data3)):
> +                d['l2_miss'] += 1
> +                if (get_data_src(op_data2) > 1):
> +                    d['l3_miss'] += 1
> +        if (is_phy_addr_val(op_data3)):
> +            if (is_l1_dtlb_miss(op_data3)):
> +                d['l1_dtlb_miss'] += 1
> +                dtlb_miss_lat = get_dtlb_miss_lat(op_data3)
> +                d['dtlb_miss_lat'].append(dtlb_miss_lat)
> +                if (is_l2_dtlb_miss(op_data3)):
> +                    d['l2_dtlb_miss'] += 1
> +
> +def print_header():
> +    addr_width = len(format(data[0]['addr'], 'x')) + 32
> +    pattern = ("%-" + str(addr_width) + "s | %7s | %7s %7s %9s %7s %9s %7s %9s %7s"
> +               " %7s | %7s %9s %7s %9s %7s %7s | %15s %9s")
> +    print(pattern % ("", "Nr", "", "", "", "", "", "", "", "90th", "Avg", "L1Dtlb", "",
> +                    "L2Dtlb", "", "90th", "Avg", "Branch", ""))
> +    print(pattern % ("Disassembly", "Samples", "LdSt", "DcMiss", "(%)", "L2Miss", "(%)",
> +                     "L3Miss", "(%)", "PctLat", "Lat", "Miss", "(%)", "Miss", "(%)",
> +                     "PctLat", "Lat", "Miss/Retired", "(%)"))
> +    print("--------------------------------------------------------------------------------------"
> +          "--------------------------------------------------------------------------------------"
> +          "------------------------------------------------")
> +
> +def print_footer():
> +    print("Percentages: Cache miss and TLB miss %es are wrt NrLdSt not NrSamples")
> +    print("--------------------------------------------------------------------------------------"
> +          "--------------------------------------------------------------------------------------"
> +          "------------------------------------------------")
> +def trace_end():
> +    global data
> +
> +    print_header()
> +
> +    for d in data:
> +        dc_miss_perc = 0
> +        l2_miss_perc = 0
> +        l3_miss_perc = 0
> +        l1_dtlb_miss_perc = 0
> +        l2_dtlb_miss_perc = 0
> +        avg_dc_miss_lat = 0
> +        pct_dc_miss_lat = 0
> +        avg_dtlb_miss_lat = 0
> +        pct_dtlb_miss_lat = 0
> +        if (d['ld_cnt'] or d['st_cnt']):
> +            dc_miss_perc = (d['dc_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
> +            l2_miss_perc = (d['l2_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
> +            l3_miss_perc = (d['l3_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
> +            l1_dtlb_miss_perc = (d['l1_dtlb_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
> +            l2_dtlb_miss_perc = (d['l2_dtlb_miss'] * 100) / float(d['ld_cnt'] + d['st_cnt'])
> +            if (d['dc_miss_lat']):
> +                avg_dc_miss_lat = sum(d['dc_miss_lat']) / float(len(d['dc_miss_lat']))
> +                pct_dc_miss_lat = np.percentile(d['dc_miss_lat'], 90)
> +            if (d['dtlb_miss_lat']):
> +                avg_dtlb_miss_lat = sum(d['dtlb_miss_lat']) / float(len(d['dtlb_miss_lat']))
> +                pct_dtlb_miss_lat = np.percentile(d['dtlb_miss_lat'], 90)
> +
> +        br_miss_perc = 0
> +        if (d['br_ret']):
> +            br_miss_perc = (d['br_miss'] * 100) / float(d['br_ret'])
> +
> +        print("%x: %-30s | %7d | %7d %7d (%6.2f%%) %7d (%6.2f%%) %7d (%6.2f%%)"
> +              " %7d %7d | %7d (%6.2f%%) %7d (%6.2f%%) %7d %7d | %7d/%-7d (%6.2f%%)" %
> +              (d['addr'], d['insn'], d['nr_samples'], d['ld_cnt'] + d['st_cnt'],
> +               d['dc_miss'], dc_miss_perc, d['l2_miss'], l2_miss_perc,
> +               d['l3_miss'], l3_miss_perc, pct_dc_miss_lat, avg_dc_miss_lat,
> +               d['l1_dtlb_miss'], l1_dtlb_miss_perc, d['l2_dtlb_miss'],
> +               l2_dtlb_miss_perc, pct_dtlb_miss_lat, avg_dtlb_miss_lat,
> +               d['br_miss'], d['br_ret'], br_miss_perc))
> +
> +    print_footer()
> diff --git a/tools/perf/scripts/python/amd-ibs-op-metrics.py b/tools/perf/scripts/python/amd-ibs-op-metrics.py
> new file mode 100644
> index 000000000000..67c0b2f9d79a
> --- /dev/null
> +++ b/tools/perf/scripts/python/amd-ibs-op-metrics.py
> @@ -0,0 +1,285 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2025 Advanced Micro Devices, Inc.
> +#
> +# Print various metric events at function granularity using AMD IBS Op PMU.
> +
> +from __future__ import print_function

Again similar feedback to the other files.

Thanks,
Ian

> +
> +import os
> +import sys
> +import re
> +import numpy as np
> +from optparse import OptionParser, make_option
> +
> +# To avoid BrokenPipeError when redirecting output to head/less etc.
> +from signal import signal, SIGPIPE, SIG_DFL
> +signal(SIGPIPE,SIG_DFL)
> +
> +# IBS OP DATA bit positions
> +IBS_OPDATA_BR_TAKEN_SHIFT       = 35
> +IBS_OPDATA_BR_MISS_SHIFT        = 36
> +IBS_OPDATA_BR_RET_SHIFT         = 37
> +
> +# IBS OP DATA2 bit positions
> +IBS_OPDATA2_DATA_SRC_LOW_SHIFT  = 0
> +IBS_OPDATA2_DATA_SRC_HIGH_SHIFT = 6
> +
> +# IBS OP DATA3 bit positions
> +IBS_OPDATA3_LDOP_SHIFT          = 0
> +IBS_OPDATA3_STOP_SHIFT          = 1
> +IBS_OPDATA3_L1_DTLB_MISS_SHIFT  = 2
> +IBS_OPDATA3_L2_DTLB_MISS_SHIFT  = 3
> +IBS_OPDATA3_DC_MISS_SHIFT       = 7
> +IBS_OPDATA3_L2_MISS_SHIFT       = 20
> +IBS_OPDATA3_DC_MISS_LAT_SHIFT   = 32
> +IBS_OPDATA3_PHYADDR_VAL_SHIFT   = 18
> +IBS_OPDATA3_DTLB_MISS_LAT_SHIFT = 48
> +
> +allowed_sort_keys = ("nr_samples", "dc_miss", "l2_miss", "l3_miss", "l1_dtlb_miss", "l2_dtlb_miss", "br_miss")
> +default_sort_order = ("nr_samples",) # Trailing comman is needed for single member tuple
> +sort_order = default_sort_order
> +options = None
> +
> +def parse_cmdline_options():
> +    global sort_order
> +    global options
> +
> +    option_list = [
> +        make_option("-s", "--sort", dest="sort",
> +                    help="Comma separated custom sort order. Allowed values: " +
> +                         ", ".join(allowed_sort_keys))
> +    ]
> +
> +    parser = OptionParser(option_list=option_list)
> +    (options, args) = parser.parse_args()
> +
> +    if (options.sort):
> +        sort_err = 0
> +        temp = []
> +        for sort_option in options.sort.split(","):
> +            if sort_option not in allowed_sort_keys:
> +                print("ERROR: Invalid sort option: %s" % sort_option)
> +                print("       Falling back to default sort order.")
> +                sort_err = 1
> +                break
> +            else:
> +                temp.append(sort_option)
> +
> +        if (sort_err == 0):
> +            sort_order = tuple(temp)
> +
> +parse_cmdline_options()
> +
> +# Final data
> +data = {}
> +
> +def init_data_element(symbol, cpumode, dso):
> +    # XXX: Should the key be dso:symbol ?
> +    data[symbol] = {
> +        'nr_samples': 0,
> +        'cpumode': cpumode,
> +
> +        # Branch data
> +        'br_ret': 0,
> +        'br_miss': 0,
> +        'br_taken': 0,
> +        'br_fallth': 0,
> +
> +        # Load / Store data
> +        'ld_cnt': 0, # LdOp=1 && StOp=1 are only added int ld_cnt
> +        'st_cnt': 0,
> +        'dc_miss': 0,
> +        'l2_miss': 0,
> +        'l3_miss': 0,
> +        # XXX: Breakdown beyond L3 ?
> +        'dc_miss_lat': [],
> +
> +        'l1_dtlb_miss': 0,
> +        'l2_dtlb_miss': 0,
> +        'dtlb_miss_lat': [],
> +
> +        # Misc data
> +        'dso': dso,
> +    }
> +
> +def get_cpumode(cpumode):
> +    if (cpumode == 1):
> +        return 'K'
> +    if (cpumode == 2):
> +        return 'U'
> +    if (cpumode == 3):
> +        return 'H'
> +    if (cpumode == 4):
> +        return 'GK'
> +    if (cpumode == 5):
> +        return 'GU'
> +    return '?'
> +
> +def is_br_ret(op_data):
> +    return (op_data >> IBS_OPDATA_BR_RET_SHIFT) & 0x1
> +
> +def is_br_miss(op_data):
> +    return (op_data >> IBS_OPDATA_BR_MISS_SHIFT) & 0x1
> +
> +def is_br_taken(op_data):
> +    return (op_data >> IBS_OPDATA_BR_TAKEN_SHIFT) & 0x1
> +
> +def is_ld_op(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_LDOP_SHIFT) & 0x1
> +
> +def is_st_op(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_STOP_SHIFT) & 0x1
> +
> +def is_dc_miss(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_DC_MISS_SHIFT) & 0x1
> +
> +def get_dc_miss_lat(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_DC_MISS_LAT_SHIFT) & 0xffff
> +
> +def is_l2_miss(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_L2_MISS_SHIFT) & 0x1
> +
> +def get_data_src(op_data2):
> +    data_src_high = (op_data2 >> IBS_OPDATA2_DATA_SRC_HIGH_SHIFT) & 0x3
> +    data_src_low = (op_data2 >> IBS_OPDATA2_DATA_SRC_LOW_SHIFT) & 0x7
> +    return (data_src_high << 3) | data_src_low
> +
> +def is_phy_addr_val(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_PHYADDR_VAL_SHIFT) & 0x1
> +
> +def is_l1_dtlb_miss(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_L1_DTLB_MISS_SHIFT) & 0x1
> +
> +def get_dtlb_miss_lat(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_DTLB_MISS_LAT_SHIFT) & 0xffff
> +
> +def is_l2_dtlb_miss(op_data3):
> +    return (op_data3 >> IBS_OPDATA3_L2_DTLB_MISS_SHIFT) & 0x1
> +
> +def process_event(param_dict):
> +    raw_buf = param_dict['raw_buf']
> +    op_data = int.from_bytes(raw_buf[20:28], "little")
> +    op_data2 = int.from_bytes(raw_buf[28:36], "little")
> +    op_data3 = int.from_bytes(raw_buf[36:44], "little")
> +
> +    if ('symbol' in param_dict):
> +        symbol = param_dict['symbol']
> +        symbol = re.sub(r'\(.*\)', '', symbol)
> +    else:
> +        symbol = hex(param_dict['sample']['ip'])
> +
> +    if (symbol not in data):
> +        init_data_element(symbol, get_cpumode(param_dict['sample']['cpumode']),
> +                          param_dict['dso'] if 'dso' in param_dict else "")
> +
> +    data[symbol]['nr_samples'] += 1
> +
> +    if (is_br_ret(op_data)):
> +        data[symbol]['br_ret'] += 1
> +        if (is_br_miss(op_data)):
> +            data[symbol]['br_miss'] += 1
> +        if (is_br_taken(op_data)):
> +            data[symbol]['br_taken'] += 1
> +
> +    ld_st = 0
> +    if (is_ld_op(op_data3)):
> +        data[symbol]['ld_cnt'] += 1
> +        ld_st = 1
> +    elif (is_st_op(op_data3)):
> +        data[symbol]['st_cnt'] += 1
> +        ld_st = 1
> +
> +    if (ld_st == 1):
> +        if (is_dc_miss(op_data3)):
> +            data[symbol]['dc_miss'] += 1
> +            dc_miss_lat = get_dc_miss_lat(op_data3)
> +            data[symbol]['dc_miss_lat'].append(dc_miss_lat)
> +            if (is_l2_miss(op_data3)):
> +                data[symbol]['l2_miss'] += 1
> +                if (get_data_src(op_data2) > 1):
> +                    data[symbol]['l3_miss'] += 1
> +        if (is_phy_addr_val(op_data3)):
> +            if (is_l1_dtlb_miss(op_data3)):
> +                data[symbol]['l1_dtlb_miss'] += 1
> +                dtlb_miss_lat = get_dtlb_miss_lat(op_data3)
> +                data[symbol]['dtlb_miss_lat'].append(dtlb_miss_lat)
> +                if (is_l2_dtlb_miss(op_data3)):
> +                    data[symbol]['l2_dtlb_miss'] += 1
> +
> +def print_sort_order():
> +    global sort_order
> +    print("Sort Order: " + ",".join(sort_order))
> +
> +def print_header():
> +    print_sort_order()
> +    print("Percentages: Cache miss and TLB miss %es are wrt NrLdSt not NrSamples")
> +    print("%-45s| %7s | %7s %7s %9s %7s %9s %7s %9s %7s %7s | %7s %9s %7s %9s %7s %7s | %15s %9s | %s" %
> +          ("","Nr", "Nr", "", "", "", "", "", "", "90th", "Avg", "L1Dtlb", "", "L2Dtlb", "", "90th",
> +           "Avg", "Branch", "", ""))
> +    print("%-45s| %7s | %7s %7s %9s %7s %9s %7s %9s %7s %7s | %7s %9s %7s %9s %7s %7s | %15s %9s | %s" %
> +          ("function","Samples", "LdSt", "DcMiss", "(%)", "L2Miss", "(%)", "L3Miss", "(%)",
> +           "PctLat", "Lat", "Miss", "(%)", "Miss", "(%)", "PctLat", "Lat", "Miss/Retired", "(%)", "dso"))
> +    print("--------------------------------------------------------------------------------------"
> +          "--------------------------------------------------------------------------------------"
> +          "----------------------------------------------------------------")
> +
> +def print_footer():
> +    print("--------------------------------------------------------------------------------------"
> +          "--------------------------------------------------------------------------------------"
> +          "----------------------------------------------------------------")
> +    print()
> +
> +def sort_fun(item):
> +    global sort_order
> +
> +    temp = []
> +    for sort_option in sort_order:
> +        temp.append(item[1][sort_option])
> +    return tuple(temp)
> +
> +def trace_end():
> +    sorted_data = sorted(data.items(), key = sort_fun, reverse = True)
> +
> +    print_header()
> +
> +    for d in sorted_data:
> +        symbol_cpumode = d[0] + " [" + d[1]['cpumode'] + "]"
> +
> +        dc_miss_perc = 0
> +        l2_miss_perc = 0
> +        l3_miss_perc = 0
> +        l1_dtlb_miss_perc = 0
> +        l2_dtlb_miss_perc = 0
> +        avg_dc_miss_lat = 0
> +        pct_dc_miss_lat = 0
> +        avg_dtlb_miss_lat = 0
> +        pct_dtlb_miss_lat = 0
> +        if (d[1]['ld_cnt'] or d[1]['st_cnt']):
> +            dc_miss_perc = (d[1]['dc_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
> +            l2_miss_perc = (d[1]['l2_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
> +            l3_miss_perc = (d[1]['l3_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
> +            l1_dtlb_miss_perc = (d[1]['l1_dtlb_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
> +            l2_dtlb_miss_perc = (d[1]['l2_dtlb_miss'] * 100) / float(d[1]['ld_cnt'] + d[1]['st_cnt'])
> +            if (d[1]['dc_miss_lat']):
> +                avg_dc_miss_lat = sum(d[1]['dc_miss_lat']) / float(len(d[1]['dc_miss_lat']))
> +                pct_dc_miss_lat = np.percentile(d[1]['dc_miss_lat'], 90)
> +            if (d[1]['dtlb_miss_lat']):
> +                avg_dtlb_miss_lat = sum(d[1]['dtlb_miss_lat']) / float(len(d[1]['dtlb_miss_lat']))
> +                pct_dtlb_miss_lat = np.percentile(d[1]['dtlb_miss_lat'], 90)
> +
> +        br_miss_perc = 0
> +        if (d[1]['br_ret']):
> +            br_miss_perc = (d[1]['br_miss'] * 100) / float(d[1]['br_ret'])
> +
> +        print("%-45s| %7d | %7d %7d (%6.2f%%) %7d (%6.2f%%) %7d (%6.2f%%)"
> +              " %7d %7d | %7d (%6.2f%%) %7d (%6.2f%%) %7d %7d | %7d/%-7d (%6.2f%%) | %s" %
> +              (symbol_cpumode, d[1]['nr_samples'],
> +              d[1]['ld_cnt'] + d[1]['st_cnt'], d[1]['dc_miss'], dc_miss_perc,
> +              d[1]['l2_miss'], l2_miss_perc, d[1]['l3_miss'], l3_miss_perc,
> +              pct_dc_miss_lat, avg_dc_miss_lat, d[1]['l1_dtlb_miss'],
> +              l1_dtlb_miss_perc, d[1]['l2_dtlb_miss'], l2_dtlb_miss_perc,
> +              pct_dtlb_miss_lat, avg_dtlb_miss_lat,
> +              d[1]['br_miss'], d[1]['br_ret'], br_miss_perc, d[1]['dso']))
> +
> +    print_footer()
> --
> 2.43.0
>