lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z9jLngEKQpkZdqXQ@google.com>
Date: Mon, 17 Mar 2025 18:25:50 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Li Huafei <lihuafei1@...wei.com>
Cc: acme@...nel.org, leo.yan@...ux.dev, james.clark@...aro.org,
	mark.rutland@....com, john.g.garry@...cle.com, will@...nel.org,
	irogers@...gle.com, mike.leach@...aro.org, peterz@...radead.org,
	mingo@...hat.com, alexander.shishkin@...ux.intel.com,
	jolsa@...nel.org, kjain@...ux.ibm.com, mhiramat@...nel.org,
	atrajeev@...ux.vnet.ibm.com, sesse@...gle.com,
	adrian.hunter@...el.com, kan.liang@...ux.intel.com,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	linux-perf-users@...r.kernel.org
Subject: Re: [PATCH 0/7] Add data type profiling support for arm64

Hello,

On Sat, Mar 15, 2025 at 12:21:30AM +0800, Li Huafei wrote:
> Hi,
> 
> This patchset supports arm64 perf data type profiling. Data type
> profiling was introduced by Namhyung [1], which associates PMU sampling
> (here referring to memory access-related event sampling) with the
> referenced data types, providing developers with an effective tool for
> analyzing the impact of memory usage and layout. For more detailed
> background, please refer to [2].

Thanks a lot for working on this!  I'm glad to see it running on more
architectures!  I'll review and leave comments on each patch.

Thanks,
Namhyung

> 
> Namhyung initially supported this feature only on x86, and later Athira
> added support for it on powerpc [3]. Unlike the x86 implementation, the
> powerpc implementation parses operands directly from raw instruction
> code instead of using the results from assembler disassembly. As Athira
> mentioned, this is mainly because not all memory access instructions on
> powerpc have explicit memory reference assembler notations '()' in their
> assembly code. On arm64, all memory access instructions have the
> notation '[]', so my implementation is similar to x86, using the
> disassembly results from objdump, llvm, or libcapstone, and parsing
> based on strings. I believe this has the advantage of reusing the
> complex instruction parsing logic of the assembler, but it may not
> perform as well as raw instruction parsing in terms of efficiency.
> 
> Below is a brief description of this patchset:
>  - Patch 1 first identifies load and store instructions and provides a
>    parsing function.
>  - Patches 2-3 are refactoring patches. They primarily move the code for
>    extracting registers and offsets to specific architecture
>    implementations. Additionally, a new callback function
>    'extract_reg_offset' is introduced to avoid having too many
>    architecture-specific implementations in the function
>    'annotate_get_insn_location()'.
>  - Patch 4 implements the extract_reg_offset callback for arm64.
>    Currently, it does not support parsing instructions with register
>    pairs or register offsets in operands. Register pairs often appear in
>    stack push/pop instructions, and register offsets are common when
>    accessing per-CPU variables, both of which require special handling.
>  - Patch 5 adds support for instruction tracing on arm64, primarily
>    addressing the issue where DWARF does not generate information for
>    intermediate pointers in pointer chains.
>  - Patches 6-7 further enhance instruction tracing. Patch 6 supports
>    parsing accesses to global variables, while Patch 7 focuses on
>    resolving accesses to the kernel's current pointer.
> 
> There are still areas for improvement in the current implementation:
>  - Support more types of memory access instructions, such as those
>    involving register pairs and register offsets.
>  - Handle all data processing instructions (e.g., mov, add), as these
>    instructions can change the state of registers and may affect the
>    accuracy of instruction tracking.
>  - Supporting parsing of special memory access scenarios like per-CPU
>    variables and arrays.
> 
> The patch set is based on 6.14-rc6 (commit 80e54e84911a). After applying
> this patch set, the date type profiling results on arm64 are as follows
> (SPE support is required):
> 
>  # perf mem record -a -K -- sleep 1
>  # perf annotate --data-type --type-stat --stdio
>  Only instruction-based sampling period is currently supported by Arm SPE.
>  Annotate data type stats:
>  total 556, ok 357 (64.2%), bad 199 (35.8%)
>  -----------------------------------------------------------
>          10 : no_sym
>          36 : no_insn_ops
>          65 : no_var
>          70 : no_typeinfo
>          18 : bad_offset
>          59 : insn_track
>  
>  Annotate type: 'struct rq' in [kernel.kallsyms] (29 samples):
>  ============================================================================
>   Percent     offset       size  field
>    100.00          0      0xe80  struct rq        {
>      0.00          0        0x4      raw_spinlock_t      __lock {
>      0.00          0        0x4          arch_spinlock_t raw_lock {
>      0.00          0        0x4              union        {
>      0.00          0        0x4                  atomic_t        val {
>      0.00          0        0x4                      int counter;
>                                                  };
>      0.00          0        0x2                  struct   {
>      0.00          0        0x1                      u8  locked;
>      0.00        0x1        0x1                      u8  pending;
>                                                  };
>      0.00          0        0x4                  struct   {
>      0.00          0        0x2                      u16 locked_pending;
>      0.00        0x2        0x2                      u16 tail;
>                                                  };
>                                              };
>                                          };
>                                      };
>     13.79        0x4        0x4      unsigned int        nr_running;
>     13.79        0x8        0x4      unsigned int        nr_numa_running;
>      0.00        0xc        0x4      unsigned int        nr_preferred_running;
>      0.00       0x10        0x4      unsigned int        numa_migrate_on;
>      0.00       0x18        0x8      long unsigned int   last_blocked_load_update_tick;
>      0.00       0x20        0x4      unsigned int        has_blocked_load;
>      0.00       0x40       0x20      call_single_data_t  nohz_csd {
>      0.00       0x40       0x10          struct __call_single_node       node {
>      0.00       0x40        0x8              struct llist_node   llist {
>      0.00       0x40        0x8                  struct llist_node*      next;
>                                              };
>      0.00       0x48        0x4              union        {
>      0.00       0x48        0x4                  unsigned int    u_flags;
>      0.00       0x48        0x4                  atomic_t        a_flags {
>      0.00       0x48        0x4                      int counter;
>                                                  };
>                                              };
>      ...
> 
> Thanks,
> Huafei
> 
> [1] https://lore.kernel.org/lkml/20231213001323.718046-1-namhyung@kernel.org/
> [2] https://lwn.net/Articles/955709/
> [3] https://lore.kernel.org/all/20240718084358.72242-1-atrajeev@linux.vnet.ibm.com/#r
> 
> Li Huafei (7):
>   perf annotate: Handle arm64 load and store instructions
>   perf annotate: Advance the mem_ref check to mov__parse()
>   perf annotate: Add 'extract_reg_offset' callback function to extract
>     register number and access offset
>   perf annotate: Support for the 'extract_reg_offset' callback function
>     in arm64
>   perf annotate-data: Support instruction tracking for arm64
>   perf annotate-data: Handle arm64 global variable access
>   perf annotate-data: Handle the access to the 'current' pointer on
>     arm64
> 
>  tools/perf/arch/arm64/annotate/instructions.c | 302 +++++++++++++++++-
>  .../perf/arch/powerpc/annotate/instructions.c |  10 +
>  tools/perf/arch/x86/annotate/instructions.c   |  99 ++++++
>  tools/perf/util/Build                         |   1 +
>  tools/perf/util/annotate-data.c               |  23 +-
>  tools/perf/util/annotate-data.h               |   4 +-
>  tools/perf/util/annotate.c                    | 112 +------
>  tools/perf/util/disasm.c                      |  14 +
>  tools/perf/util/disasm.h                      |   4 +
>  tools/perf/util/dwarf-regs-arm64.c            |  25 ++
>  tools/perf/util/include/dwarf-regs.h          |   7 +
>  11 files changed, 490 insertions(+), 111 deletions(-)
>  create mode 100644 tools/perf/util/dwarf-regs-arm64.c
> 
> -- 
> 2.25.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ