[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170223165018.5e6c469804e624f15ffbc4b2@kernel.org>
Date: Thu, 23 Feb 2017 16:50:18 +0900
From: Masami Hiramatsu <mhiramat@...nel.org>
To: Will Deacon <will.deacon@....com>, He Kuang <hekuang@...wei.com>,
Wang Nan <wangnan0@...wei.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Mark Rutland <mark.rutland@....com>,
linux-kernel@...r.kernel.org
Subject: [BUG?] perf: dwarf unwind doesn't work correctly on aarch64
Hello,
perf record -g dwarf (and perf report) doesn't show correct callchain
on aarch64. Here is how to reproduce it.
1) I've prepared an debian8 aarch64 VM on qemu-system-aarch64, and
build/install the latest perftools on it.
2) Build attached program as below
# gcc -O0 -ggdb3 -funwind-tables -o main main.c
3) Run perf to record with dwarf.
# perf record -g --call-graph dwarf,1024 -e cpu-clock:u -o /tmp/perf.data -- ./main
^C[ perf record: Woken up 35 times to write data ]
[ perf record: Captured and wrote 8.526 MB /tmp/perf.data (6495 samples) ]
4) Run perf to report the result.
# perf report -i /tmp/perf.data
# To display the perf.data header info, please use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 6K of event 'cpu-clock:u'
# Event count (approx.): 1623750000
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ............. ..........................
#
17.21% 17.21% main main [.] func2
|
---func2
17.09% 17.09% main main [.] func1
|
---func1
16.67% 16.67% main main [.] main
|
---main
.....
So, as you can see, the call graph reported each function has been
called from itself. If I report it with fp as below, perf reported
correct callgraph.
3') record it with fp
# perf record -g --call-graph fp -e cpu-clock:u -o /tmp/perf.data -- ./main
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.397 MB /tmp/perf.data (4160 samples) ]
4') report it
# perf report -i /tmp/perf.data # To display the perf.data header info, please use --header/--header-only option
#
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cpu-clock:u'
# Event count (approx.): 1040000000
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ............. ...........................
#
99.06% 0.00% main libc-2.19.so [.] __libc_start_main
|
---__libc_start_main
|
--98.94%--main
|
|--80.24%--func0
| |
| |--63.27%--func1
| | |
| | |--47.04%--func2
| | | |
.....
I tried to dump but it seems correct.
# perf report -D -i /tmp/perf.data
[...]
. 0130: c8 05 40 00 00 00 00 00 30 e1 a8 df ff ff 00 00 ..@.....0.......
. 0140: 90 05 40 00 00 00 00 00 [00 04 00 00 00 00 00 00](*) ..@.............
. 0150: 50 e1 a8 df ff ff 00 00 e8 05 40 00 00 00 00 00 P.........@.....
. 0160: 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 ................
. 0170: 70 e1 a8 df ff ff 00 00 08 06 40 00 00 00 00 00 p.........@.....
. 0180: 8c b6 c9 8f ff ff 00 00 00 00 00 00 04 00 00 00 ................
. 0190: 90 e1 a8 df ff ff 00 00 28 06 40 00 00 00 00 00 ........(.@.....
. 01a0: c0 e1 a8 df ff ff 00 00 2c 1d b6 8f 02 00 00 00 ........,.......
. 01b0: b0 e1 a8 df ff ff 00 00 40 06 40 00 00 00 00 00 ........@.@.....
. 01c0: c0 e1 a8 df ff ff 00 00 48 1d b6 8f 01 00 00 00 ........H......
[...]
1207680984048 0xf040 [0x560]: PERF_RECORD_SAMPLE(IP, 0x2): 114/114: 0x400590 per
... FP chain: nr:0
... user regs: mask 0x1ffffffff ABI 64-bit
[...]
.... x29 0xffffdfa8e130
.... lr 0x4005c8
.... sp 0xffffdfa8e130
.... pc 0x400590
... ustack: size 1024, offset 0x148
. data_src: 0x5080021
... thread: main:114
In this entry, ustack should start from offset=0x0148 in event raw data,
I marked it with (*), which is the saved stack size(0x400 = 1024) and
the top of stack has 0xffffdfa8e150 which seems next frame pointer, and
after that there is 0x4005e8, which is next return address.
00000000004005b0 <func3>:
4005b0: a9be7bfd stp x29, x30, [sp,#-32]!
4005b4: 910003fd mov x29, sp
4005b8: b9001fa0 str w0, [x29,#28]
4005bc: b9401fa0 ldr w0, [x29,#28]
4005c0: 11002000 add w0, w0, #0x8
4005c4: 97fffff3 bl 400590 <func4>
4005c8: a8c27bfd ldp x29, x30, [sp],#32
4005cc: d65f03c0 ret
00000000004005d0 <func2>:
4005d0: a9be7bfd stp x29, x30, [sp,#-32]!
4005d4: 910003fd mov x29, sp
4005d8: b9001fa0 str w0, [x29,#28]
4005dc: b9401fa0 ldr w0, [x29,#28]
4005e0: 11001000 add w0, w0, #0x4
4005e4: 97fffff3 bl 4005b0 <func3>
4005e8: a8c27bfd ldp x29, x30, [sp],#32
So, the stack data should be correct.
I guess there is a bug in libunwind on aarch64 or we missed to pass
the stack data to libunwind. (BTW, it works correctly on arm32)
Thank you,
--
Masami Hiramatsu <mhiramat@...nel.org>
View attachment "main.c" of type "text/x-csrc" (273 bytes)
Powered by blists - more mailing lists