[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z9jLngEKQpkZdqXQ@google.com>
Date: Mon, 17 Mar 2025 18:25:50 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Li Huafei <lihuafei1@...wei.com>
Cc: acme@...nel.org, leo.yan@...ux.dev, james.clark@...aro.org,
mark.rutland@....com, john.g.garry@...cle.com, will@...nel.org,
irogers@...gle.com, mike.leach@...aro.org, peterz@...radead.org,
mingo@...hat.com, alexander.shishkin@...ux.intel.com,
jolsa@...nel.org, kjain@...ux.ibm.com, mhiramat@...nel.org,
atrajeev@...ux.vnet.ibm.com, sesse@...gle.com,
adrian.hunter@...el.com, kan.liang@...ux.intel.com,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 0/7] Add data type profiling support for arm64
Hello,
On Sat, Mar 15, 2025 at 12:21:30AM +0800, Li Huafei wrote:
> Hi,
>
> This patchset supports arm64 perf data type profiling. Data type
> profiling was introduced by Namhyung [1], which associates PMU sampling
> (here referring to memory access-related event sampling) with the
> referenced data types, providing developers with an effective tool for
> analyzing the impact of memory usage and layout. For more detailed
> background, please refer to [2].
Thanks a lot for working on this! I'm glad to see it running on more
architectures! I'll review and leave comments on each patch.
Thanks,
Namhyung
>
> Namhyung initially supported this feature only on x86, and later Athira
> added support for it on powerpc [3]. Unlike the x86 implementation, the
> powerpc implementation parses operands directly from raw instruction
> code instead of using the results from assembler disassembly. As Athira
> mentioned, this is mainly because not all memory access instructions on
> powerpc have explicit memory reference assembler notations '()' in their
> assembly code. On arm64, all memory access instructions have the
> notation '[]', so my implementation is similar to x86, using the
> disassembly results from objdump, llvm, or libcapstone, and parsing
> based on strings. I believe this has the advantage of reusing the
> complex instruction parsing logic of the assembler, but it may not
> perform as well as raw instruction parsing in terms of efficiency.
>
> Below is a brief description of this patchset:
> - Patch 1 first identifies load and store instructions and provides a
> parsing function.
> - Patches 2-3 are refactoring patches. They primarily move the code for
> extracting registers and offsets to specific architecture
> implementations. Additionally, a new callback function
> 'extract_reg_offset' is introduced to avoid having too many
> architecture-specific implementations in the function
> 'annotate_get_insn_location()'.
> - Patch 4 implements the extract_reg_offset callback for arm64.
> Currently, it does not support parsing instructions with register
> pairs or register offsets in operands. Register pairs often appear in
> stack push/pop instructions, and register offsets are common when
> accessing per-CPU variables, both of which require special handling.
> - Patch 5 adds support for instruction tracing on arm64, primarily
> addressing the issue where DWARF does not generate information for
> intermediate pointers in pointer chains.
> - Patches 6-7 further enhance instruction tracing. Patch 6 supports
> parsing accesses to global variables, while Patch 7 focuses on
> resolving accesses to the kernel's current pointer.
>
> There are still areas for improvement in the current implementation:
> - Support more types of memory access instructions, such as those
> involving register pairs and register offsets.
> - Handle all data processing instructions (e.g., mov, add), as these
> instructions can change the state of registers and may affect the
> accuracy of instruction tracking.
> - Supporting parsing of special memory access scenarios like per-CPU
> variables and arrays.
>
> The patch set is based on 6.14-rc6 (commit 80e54e84911a). After applying
> this patch set, the date type profiling results on arm64 are as follows
> (SPE support is required):
>
> # perf mem record -a -K -- sleep 1
> # perf annotate --data-type --type-stat --stdio
> Only instruction-based sampling period is currently supported by Arm SPE.
> Annotate data type stats:
> total 556, ok 357 (64.2%), bad 199 (35.8%)
> -----------------------------------------------------------
> 10 : no_sym
> 36 : no_insn_ops
> 65 : no_var
> 70 : no_typeinfo
> 18 : bad_offset
> 59 : insn_track
>
> Annotate type: 'struct rq' in [kernel.kallsyms] (29 samples):
> ============================================================================
> Percent offset size field
> 100.00 0 0xe80 struct rq {
> 0.00 0 0x4 raw_spinlock_t __lock {
> 0.00 0 0x4 arch_spinlock_t raw_lock {
> 0.00 0 0x4 union {
> 0.00 0 0x4 atomic_t val {
> 0.00 0 0x4 int counter;
> };
> 0.00 0 0x2 struct {
> 0.00 0 0x1 u8 locked;
> 0.00 0x1 0x1 u8 pending;
> };
> 0.00 0 0x4 struct {
> 0.00 0 0x2 u16 locked_pending;
> 0.00 0x2 0x2 u16 tail;
> };
> };
> };
> };
> 13.79 0x4 0x4 unsigned int nr_running;
> 13.79 0x8 0x4 unsigned int nr_numa_running;
> 0.00 0xc 0x4 unsigned int nr_preferred_running;
> 0.00 0x10 0x4 unsigned int numa_migrate_on;
> 0.00 0x18 0x8 long unsigned int last_blocked_load_update_tick;
> 0.00 0x20 0x4 unsigned int has_blocked_load;
> 0.00 0x40 0x20 call_single_data_t nohz_csd {
> 0.00 0x40 0x10 struct __call_single_node node {
> 0.00 0x40 0x8 struct llist_node llist {
> 0.00 0x40 0x8 struct llist_node* next;
> };
> 0.00 0x48 0x4 union {
> 0.00 0x48 0x4 unsigned int u_flags;
> 0.00 0x48 0x4 atomic_t a_flags {
> 0.00 0x48 0x4 int counter;
> };
> };
> ...
>
> Thanks,
> Huafei
>
> [1] https://lore.kernel.org/lkml/20231213001323.718046-1-namhyung@kernel.org/
> [2] https://lwn.net/Articles/955709/
> [3] https://lore.kernel.org/all/20240718084358.72242-1-atrajeev@linux.vnet.ibm.com/#r
>
> Li Huafei (7):
> perf annotate: Handle arm64 load and store instructions
> perf annotate: Advance the mem_ref check to mov__parse()
> perf annotate: Add 'extract_reg_offset' callback function to extract
> register number and access offset
> perf annotate: Support for the 'extract_reg_offset' callback function
> in arm64
> perf annotate-data: Support instruction tracking for arm64
> perf annotate-data: Handle arm64 global variable access
> perf annotate-data: Handle the access to the 'current' pointer on
> arm64
>
> tools/perf/arch/arm64/annotate/instructions.c | 302 +++++++++++++++++-
> .../perf/arch/powerpc/annotate/instructions.c | 10 +
> tools/perf/arch/x86/annotate/instructions.c | 99 ++++++
> tools/perf/util/Build | 1 +
> tools/perf/util/annotate-data.c | 23 +-
> tools/perf/util/annotate-data.h | 4 +-
> tools/perf/util/annotate.c | 112 +------
> tools/perf/util/disasm.c | 14 +
> tools/perf/util/disasm.h | 4 +
> tools/perf/util/dwarf-regs-arm64.c | 25 ++
> tools/perf/util/include/dwarf-regs.h | 7 +
> 11 files changed, 490 insertions(+), 111 deletions(-)
> create mode 100644 tools/perf/util/dwarf-regs-arm64.c
>
> --
> 2.25.1
>
Powered by blists - more mailing lists