[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27703989-e510-c964-2af7-ef0a5611f8cf@huawei.com>
Date: Sat, 27 Mar 2021 10:15:56 +0800
From: Yang Jihong <yangjihong1@...wei.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>
CC: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Yao Jin <yao.jin@...ux.intel.com>, <gustavoars@...nel.org>,
<mliska@...e.cz>, linux-kernel <linux-kernel@...r.kernel.org>,
<zhangjinhao2@...wei.com>
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode
Hello,
On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
> Em Fri, Mar 26, 2021 at 12:25:37PM +0900, Namhyung Kim escreveu:
>> On Fri, Mar 26, 2021 at 11:24 AM Yang Jihong <yangjihong1@...wei.com> wrote:
>>> On 2021/3/19 20:35, Yang Jihong wrote:
>>>> In hist__find_annotations function, since different hist_entry may point to same
>>>> symbol, we free notes->src to signal already processed this symbol in stdio mode;
>>>> when annotate, entry will skipped if notes->src is NULL to avoid repeated output.
>>>>
>>>> However, there is a problem, for example, run the following command:
>>>>
>>>> # perf record -e branch-misses -e branch-instructions -a sleep 1
>>>>
>>>> perf.data file contains different types of sample event.
>>>>
>>>> If the same IP sample event exists in branch-misses and branch-instructions,
>>>> this event uses the same symbol. When annotate branch-misses events, notes->src
>>>> corresponding to this event is set to null, as a result, when annotate
>>>> branch-instructions events, this event is skipped and no annotate is output.
>>>>
>>>> Solution of this patch is to remove zfree in hists__find_annotations and
>>>> change sort order to "dso,symbol" to avoid duplicate output when different
>>>> processes correspond to the same symbol.
>
>>>> Signed-off-by: Yang Jihong <yangjihong1@...wei.com>
>
>> Acked-by: Namhyung Kim <namhyung@...nel.org>
>
> Without looking at the patch, just at its description of the problem, I
> tried to annotate two events in a group, to get the annotate group view
> output with both events, and it seems I'm getting samples accounted for
> both events:
>
> [root@...e ~]# perf record -e '{branch-misses,branch-instructions}' -a sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 2.296 MB perf.data (2507 samples) ]
> [root@...e ~]#
> [root@...e ~]# perf report | grep -v '^#' | head -5
> 17.49% 19.19% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarking::Run
> 12.17% 17.04% ThreadPoolForeg chromium-browser [.] v8::internal::Sweeper::RawSweep
> 11.14% 11.63% ThreadPoolForeg chromium-browser [.] v8::internal::MarkingVisitorBase<v8::internal::ConcurrentMarkingVisitor, v8::internal::ConcurrentMarkingState>::ProcessStrongHeapObject<v8::internal::CompressedHeapObjectSlot>
> 7.65% 7.84% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarkingVisitor::ShouldVisit
> 5.66% 6.72% ThreadPoolForeg chromium-browser [.] v8::internal::ConcurrentMarkingVisitor::VisitPointersInSnapshot
>
> [root@...e ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run
> Samples: 444 of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
> v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
> Percent
>
>
> Disassembly of section .text:
>
> 0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
> v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
> push %rbp
> mov %rsp,%rbp
> push %r15
> push %r14
> mov %rdi,%r14
> push %r13
> mov %edx,%r13d
> push %r12
> mov %ecx,%r12d
> push %rbx
> sub $0x1298,%rsp
> mov %rsi,-0x1228(%rbp)
> mov %fs:0x28,%rax
> mov %rax,-0x38(%rbp)
> <SNIP>
> movzwl 0x2(%rbx),%eax
> test %ax,%ax
> ↓ jne 4a9
> mov -0x10e8(%rbp),%rdx
> cmpw $0x0,0x2(%rdx)
> 0.41 0.39 ↓ je 4b90
> movq %rbx,%xmm0
> movq %rdx,%xmm2
> mov %rdx,%rbx
> punpcklqdq %xmm2,%xmm0
> movups %xmm0,-0x10e8(%rbp)
> movzwl 0x2(%rdx),%eax
> 4a9: sub $0x1,%eax
> mov %ax,0x2(%rbx)
> 0.36 0.91 movzwl %ax,%eax
> 0.60 0.00 mov 0x10(%rbx,%rax,8),%rax
> 3.44 2.46 mov %rax,-0x11e0(%rbp)
> 0.00 0.34 4bf: mov 0x8(%r13),%rax
> 0.00 0.36 add $0x1,%r15d
> 0.00 0.34 mov 0x110(%rax),%rax
> mov 0x128(%rax),%rcx
> 0.88 0.36 mov 0x8(%r13),%rax
> mov 0x110(%rax),%rdx
> mov 0x130(%rdx),%rdx
> 0.00 0.48 mov 0x140(%rax),%rax
> mov 0x110(%rax),%rsi
> 0.61 0.47 mov -0x11e0(%rbp),%rax
> 2.01 2.32 sub $0x1,%rax
> cmp %rcx,%rax
> 0.00 0.35 setae %cl
> 1.31 0.33 cmp %rdx,%rax
> setb %dl
> 0.00 0.24 test %dl,%cl
> 0.00 0.12 ↓ jne 4b70
> cmp %rsi,%rax
> ↓ je 4b70
> mov (%rax),%eax
> 29.10 29.90 add -0x1220(%rbp),%rax
> cmpb $0x0,-0x1218(%rbp)
> mov %rax,-0x1210(%rbp)
> 0.00 0.65 ↓ jne 4fa0
> mov -0x11e0(%rbp),%r9
> lea 0x6(%rax),%rbx
> 0.38 0.00 545: movzbl (%rbx),%eax
> 4.90 5.34 cmp $0x4c,%al
> ↓ ja 5026
> 0.58 0.00 550: lea v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
> movslq (%rdi,%rax,4),%rax
> 4.97 3.19 add %rdi,%rax
> <SNIP>
>
> If I ask for number of samples:
>
> [root@...e ~]# perf config annotate.show_nr_samples=true
> [root@...e ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run
>
> Samples: 444 of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
> v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
> Samples
>
>
> Disassembly of section .text:
>
> 0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
> v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
> push %rbp
> mov %rsp,%rbp
> push %r15
> push %r14
> mov %rdi,%r14
> push %r13
> mov %edx,%r13d
> push %r12
> mov %ecx,%r12d
> push %rbx
> sub $0x1298,%rsp
> mov %rsi,-0x1228(%rbp)
> mov %fs:0x28,%rax
> mov %rax,-0x38(%rbp)
> <SNIP>
> movzwl 0x2(%rbx),%eax
> test %ax,%ax
> ↓ jne 4a9
> mov -0x10e8(%rbp),%rdx
> cmpw $0x0,0x2(%rdx)
> 1 1 ↓ je 4b90
> movq %rbx,%xmm0
> movq %rdx,%xmm2
> mov %rdx,%rbx
> punpcklqdq %xmm2,%xmm0
> movups %xmm0,-0x10e8(%rbp)
> movzwl 0x2(%rdx),%eax
> 4a9: sub $0x1,%eax
> mov %ax,0x2(%rbx)
> 1 2 movzwl %ax,%eax
> 1 0 mov 0x10(%rbx,%rax,8),%rax
> 8 5 mov %rax,-0x11e0(%rbp)
> 0 1 4bf: mov 0x8(%r13),%rax
> 0 1 add $0x1,%r15d
> 0 1 mov 0x110(%rax),%rax
> mov 0x128(%rax),%rcx
> 3 1 mov 0x8(%r13),%rax
> mov 0x110(%rax),%rdx
> mov 0x130(%rdx),%rdx
> 0 1 mov 0x140(%rax),%rax
> mov 0x110(%rax),%rsi
> 2 1 mov -0x11e0(%rbp),%rax
> 6 6 sub $0x1,%rax
> cmp %rcx,%rax
> 0 1 setae %cl
> 2 1 cmp %rdx,%rax
> setb %dl
> 0 1 test %dl,%cl
> 0 1 ↓ jne 4b70
> cmp %rsi,%rax
> ↓ je 4b70
> mov (%rax),%eax
> 58 73 add -0x1220(%rbp),%rax
> cmpb $0x0,-0x1218(%rbp)
> mov %rax,-0x1210(%rbp)
> 0 1 ↓ jne 4fa0
> mov -0x11e0(%rbp),%r9
> lea 0x6(%rax),%rbx
> 1 0 545: movzbl (%rbx),%eax
> 10 13 cmp $0x4c,%al
> ↓ ja 5026
> 1 0 550: lea v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
> movslq (%rdi,%rax,4),%rax
> 9 8 add %rdi,%rax
>
> <SNIP>
>
> So it seems to be working, what am I missing? Is this strictly non
> group related?
>
Yes, it is non group related.
This problem occurs only when different events need to be recorded at
the same time, i.e.:
perf record -e branch-misses -e branch-instructions -a sleep 1
The output results of perf script and perf annotate do not match.
Some events are not output in perf annotate.
> - Arnaldo
> .
>
Powered by blists - more mailing lists