lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 23 Feb 2017 16:50:18 +0900
From:   Masami Hiramatsu <mhiramat@...nel.org>
To:     Will Deacon <will.deacon@....com>, He Kuang <hekuang@...wei.com>,
        Wang Nan <wangnan0@...wei.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Mark Rutland <mark.rutland@....com>,
        linux-kernel@...r.kernel.org
Subject: [BUG?] perf: dwarf unwind doesn't work correctly on aarch64

Hello,

perf record -g dwarf (and perf report) doesn't show correct callchain
on aarch64. Here is how to reproduce it.

1) I've prepared an debian8 aarch64 VM on qemu-system-aarch64, and
   build/install the latest perftools on it.

2) Build attached program as below
# gcc -O0 -ggdb3 -funwind-tables -o main main.c

3) Run perf to record with dwarf.
# perf record -g --call-graph dwarf,1024 -e cpu-clock:u -o /tmp/perf.data -- ./main
^C[ perf record: Woken up 35 times to write data ]
[ perf record: Captured and wrote 8.526 MB /tmp/perf.data (6495 samples) ]


4) Run perf to report the result.

# perf report -i /tmp/perf.data
# To display the perf.data header info, please use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 6K of event 'cpu-clock:u'
# Event count (approx.): 1623750000
#
# Children      Self  Command  Shared Object  Symbol                    
# ........  ........  .......  .............  ..........................
#
    17.21%    17.21%  main     main           [.] func2
            |
            ---func2

    17.09%    17.09%  main     main           [.] func1
            |
            ---func1

    16.67%    16.67%  main     main           [.] main
            |
            ---main
.....

So, as you can see, the call graph reported each function has been
called from itself. If I report it with fp as below, perf reported
correct callgraph.

3') record it with fp
# perf record -g --call-graph fp -e cpu-clock:u -o /tmp/perf.data -- ./main 
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.397 MB /tmp/perf.data (4160 samples) ]

4') report it

# perf report -i /tmp/perf.data # To display the perf.data header info, please use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cpu-clock:u'
# Event count (approx.): 1040000000
#
# Children      Self  Command  Shared Object  Symbol                     
# ........  ........  .......  .............  ...........................
#
    99.06%     0.00%  main     libc-2.19.so   [.] __libc_start_main
            |
            ---__libc_start_main
               |          
                --98.94%--main
                          |          
                          |--80.24%--func0
                          |          |          
                          |          |--63.27%--func1
                          |          |          |          
                          |          |          |--47.04%--func2
                          |          |          |          |         
.....


I tried to dump but it seems correct.

# perf report -D -i /tmp/perf.data
[...]
.  0130:  c8 05 40 00 00 00 00 00 30 e1 a8 df ff ff 00 00  ..@.....0.......
.  0140:  90 05 40 00 00 00 00 00 [00 04 00 00 00 00 00 00](*)  ..@.............
.  0150:  50 e1 a8 df ff ff 00 00 e8 05 40 00 00 00 00 00  P.........@.....
.  0160:  00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00  ................
.  0170:  70 e1 a8 df ff ff 00 00 08 06 40 00 00 00 00 00  p.........@.....
.  0180:  8c b6 c9 8f ff ff 00 00 00 00 00 00 04 00 00 00  ................
.  0190:  90 e1 a8 df ff ff 00 00 28 06 40 00 00 00 00 00  ........(.@.....
.  01a0:  c0 e1 a8 df ff ff 00 00 2c 1d b6 8f 02 00 00 00  ........,.......
.  01b0:  b0 e1 a8 df ff ff 00 00 40 06 40 00 00 00 00 00  ........@.@.....
.  01c0:  c0 e1 a8 df ff ff 00 00 48 1d b6 8f 01 00 00 00  ........H......
[...]
1207680984048 0xf040 [0x560]: PERF_RECORD_SAMPLE(IP, 0x2): 114/114: 0x400590 per
... FP chain: nr:0
... user regs: mask 0x1ffffffff ABI 64-bit
[...]
.... x29   0xffffdfa8e130
.... lr    0x4005c8
.... sp    0xffffdfa8e130
.... pc    0x400590
... ustack: size 1024, offset 0x148
 . data_src: 0x5080021
 ... thread: main:114

In this entry, ustack should start from offset=0x0148 in event raw data,
I marked it with (*), which is the saved stack size(0x400 = 1024) and
the top of stack has 0xffffdfa8e150 which seems next frame pointer, and
after that there is 0x4005e8, which is next return address.

00000000004005b0 <func3>:
  4005b0:       a9be7bfd        stp     x29, x30, [sp,#-32]!
  4005b4:       910003fd        mov     x29, sp
  4005b8:       b9001fa0        str     w0, [x29,#28]
  4005bc:       b9401fa0        ldr     w0, [x29,#28]
  4005c0:       11002000        add     w0, w0, #0x8
  4005c4:       97fffff3        bl      400590 <func4>
  4005c8:       a8c27bfd        ldp     x29, x30, [sp],#32
  4005cc:       d65f03c0        ret

00000000004005d0 <func2>:
  4005d0:       a9be7bfd        stp     x29, x30, [sp,#-32]!
  4005d4:       910003fd        mov     x29, sp
  4005d8:       b9001fa0        str     w0, [x29,#28]
  4005dc:       b9401fa0        ldr     w0, [x29,#28]
  4005e0:       11001000        add     w0, w0, #0x4
  4005e4:       97fffff3        bl      4005b0 <func3>
  4005e8:       a8c27bfd        ldp     x29, x30, [sp],#32

So, the stack data should be correct.

I guess there is a bug in libunwind on aarch64 or we missed to pass
the stack data to libunwind. (BTW, it works correctly on arm32)

Thank you,

-- 
Masami Hiramatsu <mhiramat@...nel.org>

View attachment "main.c" of type "text/x-csrc" (273 bytes)

Powered by blists - more mailing lists