lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <5CBFBE42-A15C-47A4-B980-62313FEF1EFA@linux.vnet.ibm.com>
Date: Fri, 12 Jul 2024 09:14:55 +0530
From: Athira Rajeev <atrajeev@...ux.vnet.ibm.com>
To: Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Ian Rogers <irogers@...gle.com>, Namhyung Kim <namhyung@...nel.org>,
        Segher Boessenkool <segher@...nel.crashing.org>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-perf-users <linux-perf-users@...r.kernel.org>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>, akanksha@...ux.ibm.com,
        Madhavan Srinivasan <maddy@...ux.ibm.com>,
        Kajol Jain <kjain@...ux.ibm.com>,
        Disha Goel <disgoel@...ux.vnet.ibm.com>
Subject: Re: [PATCH V6 00/18] Add data type profiling support for powerpc



> On 7 Jul 2024, at 8:14 PM, Athira Rajeev <atrajeev@...ux.vnet.ibm.com> wrote:
> 
> The patchset from Namhyung added support for data type profiling
> in perf tool. This enabled support to associate PMU samples to data
> types they refer using DWARF debug information. With the upstream
> perf, currently it possible to run perf report or perf annotate to
> view the data type information on x86.
> 
> Initial patchset posted here had changes need to enable data type
> profiling support for powerpc.
> 

Hi Namhyung,

Requesting for review on this V6 patchset. I have addressed review comments from V5. 
If the patchset looks good, can you please pull this in.

Thanks
Athira
> https://lore.kernel.org/all/6e09dc28-4a2e-49d8-a2b5-ffb3396a9952@csgroup.eu/T/
> 
> Main change were:
> 1. powerpc instruction nmemonic table to associate load/store
> instructions with move_ops which is use to identify if instruction
> is a memory access one.
> 2. To get register number and access offset from the given
> instruction, code uses fields from "struct arch" -> objump.
> Added entry for powerpc here.
> 3. A get_arch_regnum to return register number from the
> register name string.
> 
> But the apporach used in the initial patchset used parsing of
> disassembled code which the current perf tool implementation does.
> 
> Example: lwz     r10,0(r9)
> 
> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> registers names and offset. Also to find whether there is a memory
> reference in the operands, "memory_ref_char" field of objdump is used.
> For x86, "(" is used as memory_ref_char to tackle instructions of the
> form "mov  (%rax), %rcx".
> 
> In case of powerpc, not all instructions using "(" are the only memory
> instructions. Example, above instruction can also be of extended form (X
> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> and extract the source/target registers, second patchset added support to use
> raw instruction. With raw instruction, macros are added to extract opcode
> and register fields.
> Link to second patchset:
> https://lore.kernel.org/all/20240506121906.76639-1-atrajeev@linux.vnet.ibm.com/
> 
> Example representation using --show-raw-insn in objdump gives result:
> 
> 38 01 81 e8     ld      r4,312(r1)
> 
> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> this translates to instruction form: "ld RT,DS(RA)" and binary code
> as:
>  _____________________________________
>  | 58 |  RT  |  RA |      DS       | |
>  -------------------------------------
> 0    6     11    16              30 31
> 
> Second patchset used "objdump" again to read the raw instruction.
> But since there is no need to disassemble and binary code can be read
> directly from the DSO, third patchset (ie this patchset) uses below
> apporach. The apporach preferred in powerpc to parse sample for data
> type profiling in V3 patchset is:
> - Read directly from DSO using dso__data_read_offset
> - If that fails for any case, fallback to using libcapstone
> - If libcapstone is not supported, approach will use objdump
> 
> Patchset adds support to pick the opcode and reg fields from this
> raw/binary instruction code. This approach came in from review comment
> by Segher Boessenkool and Christophe for the initial patchset.
> 
> Apart from that, instruction tracking is enabled for powerpc and
> support function is added to find variables defined as registers
> Example, in powerpc, below two registers are
> defined to represent variable:
> 1. r13: represents local_paca
> register struct paca_struct *local_paca asm("r13");
> 
> 2. r1: represents stack_pointer
> register void *__stack_pointer asm("r1");
> 
> These are handled in this patchset.
> 
> - Patch 1 is to rearrange register state type structures to header file
> so that it can referred from other arch specific files
> - Patch 2 is to make instruction tracking as a callback to"struct arch"
> so that it can be implemented by other archs easily and defined in arch
> specific files
> - Patch 3 is to handle state type regs array size for x86 and powerpc
> - Patch 4 adds support to capture and parse raw instruction in powerpc
> using dso__data_read_offset utility
> - Patch 4 also adds logic to support using objdump when doing default "perf
> report" or "perf annotate" since it that needs disassembled instruction.
> - Patch 5 adds disasm_line__parse to parse raw instruction for powerpc
> - Patch 6 update parameters for reg extract functions to use raw
> instruction on powerpc
> - Patch 7 updates ins__find to carry raw_insn and also adds parse
> callback for memory instructions for powerpc
> - Patch 8 add support to identify memory instructions of opcode 31 in
> powerpc
> - Patch 9 adds more instructions to support instruction tracking in powerpc
> - Patch 10 and 11 handles instruction tracking for powerpc.
> - Patch 12, 13 and 14 add support to use libcapstone in powerpc
> - Patch 15 and patch 16 handles support to find global register variables
> - PAtch 17 updates data type compare functions data_type_cmp and
>  sort__typeoff_sort to include var_name along with type_name in
>  comparison.
> - Patch 18 handles insn-stat option for perf annotate
> 
> Note:
> - There are remaining unknowns (25%) as seen in annotate Instruction stats
> below.
> - This patchset is not tested on powerpc32. In next step of enhancements
> along with handling remaining unknowns, plan to cover powerpc32 changes
> based on how testing goes.
> 
> With the current patchset:
> 
> ./perf record -a -e mem-loads sleep 1
> ./perf report -s type,typeoff --hierarchy --group --stdio
> ./perf annotate --data-type --insn-stat
> 
> perf annotate logs:
> ==================
> 
> Annotate Instruction stats
> total 609, ok 446 (73.2%), bad 163 (26.8%)
> 
>  Name/opcode:  Good   Bad
>  -----------------------------------------------------------
>  58                  :   323    80
>  32                  :    49    43
>  34                  :    33    11
>  OP_31_XOP_LDX       :     8    20
>  40                  :    23     0
>  OP_31_XOP_LWARX     :     5     1
>  OP_31_XOP_LWZX      :     2     3
>  OP_31_XOP_LDARX     :     3     0
>  33                  :     0     2
>  OP_31_XOP_LBZX      :     0     1
>  OP_31_XOP_LWAX      :     0     1
>  OP_31_XOP_LHZX      :     0     1
> 
> perf report logs:
> =================
> 
>  Total Lost Samples: 0
> 
>  Samples: 1K of event 'mem-loads'
>  Event count (approx.): 937238
> 
>  Overhead  Data Type  Data Type Offset
> ........  .........  ................
>    48.60%  (unknown)  (unknown) +0 (no field)
>    11.42%  long unsigned int  long unsigned int +0 (current_stack_pointer)
>     4.68%  struct paca_struct  struct paca_struct +2312 (__current)
>     4.57%  struct paca_struct  struct paca_struct +2354 (irq_soft_mask)
>     2.69%  struct paca_struct  struct paca_struct +2808 (canary)
>     2.68%  struct paca_struct  struct paca_struct +8 (paca_index)
>     2.24%  struct paca_struct  struct paca_struct +48 (data_offset)
>     1.43%  long unsigned int  long unsigned int +0 (no field)
>     1.41%  struct vm_fault  struct vm_fault +0 (vma)
>     1.29%  struct task_struct  struct task_struct +276 (flags)
>     1.03%  struct pt_regs  struct pt_regs +264 (user_regs.msr)
>     0.90%  struct security_hook_list  struct security_hook_list +0 (list.next)
>     0.76%  struct irq_desc  struct irq_desc +304 (irq_data.chip)
>     0.76%  struct rq  struct rq +2856 (cpu)
>     0.72%  long long unsigned int  long long unsigned int +0 (no field)
> 
> Thanks
> Athira Rajeev
> 
> Changelog:
> From v5 -> v6:
> - Addressed review comments from Namhyung
>  Conditionally define TYPE_STATE_MAX_REGS based on arch.
>  Added macro for defining width of the raw codes and spaces
>  in disasm_line__parse_powerpc.
>  Call disasm_line__parse from disasm_line__parse_powerpc
>  for generic code.
>  Renamed symbol__disassemble_dso to symbol__disassemble_raw.
>  Fixed find_data_type_global_reg to correclty free var_types
>  and change indent level.
>  Fixed data_type_cmp and sort__typeoff_sort to include var_name
>  in comparing data type entries.

> 
> From v4 -> v5:
> - Addressed review comments from Namhyung
>  Handle max number of type state regs as 16 for x86 and 32 for
>  powerpc.
>  Added generic support for objdump patch first and DSO read
>  optimisation next
>  combined patch 3 and patch 4 in patchseries V4 to one patch
>  Changed reference for "raw_insn" to use "u32"
>  Splitted "parse" callback patch changes and "ins__find" patch
>  changes into two
>  Instead of making weak function, added get_powerpc_regs to
>  extract register and offset fields for powerpc
> - Addressed complation fail when "dwarf.h" is not present ie
>  elfutils devel is not present. Used includes for #ifdef HAVE_DWARF_SUPPORT
>  when including functions that use Dwarf references. Also
>  conditionally include some of the header files.
> 
> From v3->v4:
> - Addressed review comments from Ian by using capston_init from
>  "util/print_insn.c" instead of "open_capston_handle".
> - Addressed review comment from Namhyung by moving "opcode"
>  field from "struct ins" to "struct disasm_line"
> 
> From v2->v3:
> - Addressed review comments from Christophe and Namhyung for V2
> - Changed the apporach in powerpc to parse sample for data
>  type profiling as:
>  Read directly from DSO using dso__data_read_offset
>  If that fails for any case, fallback to using libcapstone
>  If libcapstone is not supported, approach will use objdump
> - Include instructions with opcode as 31 and correctly categorize
>  them as memory or arithmetic instructions.
> - Include more instructions for instruction tracking in powerpc
> 
> From v1->v2:
> - Addressed suggestion from Christophe Leroy and Segher Boessenkool
>  to use the binary code (raw insn) to fetch opcode, register and
>  offset fields.
> - Added support for instruction tracking in powerpc
> - Find the register defined variables (r13 and r1 which points to
>  local_paca and current_stack_pointer in powerpc)
> 
> Athira Rajeev (18):
>  tools/perf: Move the data structures related to register type to
>    header file
>  tools/perf: Add "update_insn_state" callback function to handle arch
>    specific instruction tracking
>  tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in
>    powerpc
>  tools/perf: Add disasm_line__parse to parse raw instruction for
>    powerpc
>  tools/perf: Add support to capture and parse raw instruction in
>    powerpc using dso__data_read_offset utility
>  tools/perf: Update parameters for reg extract functions to use raw
>    instruction on powerpc
>  tools/perf: Add parse function for memory instructions in powerpc
>  tools/perf: Add support to identify memory instructions of opcode 31
>    in powerpc
>  tools/perf: Add some of the arithmetic instructions to support
>    instruction tracking in powerpc
>  tools/perf: Add more instructions for instruction tracking
>  tools/perf: Update instruction tracking for powerpc
>  tools/perf: Make capstone_init non-static so that it can be used
>    during symbol disassemble
>  tools/perf: Use capstone_init and remove open_capstone_handle from
>    disasm.c
>  tools/perf: Add support to use libcapstone in powerpc
>  tools/perf: Add support to find global register variables using
>    find_data_type_global_reg
>  tools/perf: Add support for global_die to capture name of variable in
>    case of register defined variable
>  tools/perf: Update data_type_cmp and sort__typeoff_sort function to
>    include var_name in comparison
>  tools/perf: Set instruction name to be used with insn-stat when using
>    raw instruction
> 
> tools/include/linux/string.h                  |   2 +
> tools/lib/string.c                            |  13 +
> tools/perf/arch/arm64/annotate/instructions.c |   3 +-
> .../arch/loongarch/annotate/instructions.c    |   6 +-
> .../perf/arch/powerpc/annotate/instructions.c | 254 ++++++++
> tools/perf/arch/powerpc/util/dwarf-regs.c     |  53 ++
> tools/perf/arch/s390/annotate/instructions.c  |   5 +-
> tools/perf/arch/x86/annotate/instructions.c   | 377 ++++++++++++
> tools/perf/builtin-annotate.c                 |   4 +-
> tools/perf/util/annotate-data.c               | 545 ++++--------------
> tools/perf/util/annotate-data.h               |  83 +++
> tools/perf/util/annotate.c                    |  29 +-
> tools/perf/util/annotate.h                    |   6 +-
> tools/perf/util/disasm.c                      | 468 +++++++++++++--
> tools/perf/util/disasm.h                      |  19 +-
> tools/perf/util/dwarf-aux.c                   |   1 +
> tools/perf/util/dwarf-aux.h                   |   1 +
> tools/perf/util/include/dwarf-regs.h          |  12 +
> tools/perf/util/print_insn.c                  |  15 +-
> tools/perf/util/print_insn.h                  |   5 +
> tools/perf/util/sort.c                        |  23 +-
> 21 files changed, 1420 insertions(+), 504 deletions(-)
> 
> -- 
> 2.43.0
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ