lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 27 Mar 2021 10:15:56 +0800
From:   Yang Jihong <yangjihong1@...wei.com>
To:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Yao Jin <yao.jin@...ux.intel.com>, <gustavoars@...nel.org>,
        <mliska@...e.cz>, linux-kernel <linux-kernel@...r.kernel.org>,
        <zhangjinhao2@...wei.com>
Subject: Re: [PATCH v7] perf annotate: Fix sample events lost in stdio mode

Hello,

On 2021/3/26 20:06, Arnaldo Carvalho de Melo wrote:
> Em Fri, Mar 26, 2021 at 12:25:37PM +0900, Namhyung Kim escreveu:
>> On Fri, Mar 26, 2021 at 11:24 AM Yang Jihong <yangjihong1@...wei.com> wrote:
>>> On 2021/3/19 20:35, Yang Jihong wrote:
>>>> In hist__find_annotations function, since different hist_entry may point to same
>>>> symbol, we free notes->src to signal already processed this symbol in stdio mode;
>>>> when annotate, entry will skipped if notes->src is NULL to avoid repeated output.
>>>>
>>>> However, there is a problem, for example, run the following command:
>>>>
>>>>    # perf record -e branch-misses -e branch-instructions -a sleep 1
>>>>
>>>> perf.data file contains different types of sample event.
>>>>
>>>> If the same IP sample event exists in branch-misses and branch-instructions,
>>>> this event uses the same symbol. When annotate branch-misses events, notes->src
>>>> corresponding to this event is set to null, as a result, when annotate
>>>> branch-instructions events, this event is skipped and no annotate is output.
>>>>
>>>> Solution of this patch is to remove zfree in hists__find_annotations and
>>>> change sort order to "dso,symbol" to avoid duplicate output when different
>>>> processes correspond to the same symbol.
> 
>>>> Signed-off-by: Yang Jihong <yangjihong1@...wei.com>
> 
>> Acked-by: Namhyung Kim <namhyung@...nel.org>
> 
> Without looking at the patch, just at its description of the problem, I
> tried to annotate two events in a group, to get the annotate group view
> output with both events, and it seems I'm getting samples accounted for
> both events:
> 
> [root@...e ~]# perf record -e '{branch-misses,branch-instructions}' -a sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 2.296 MB perf.data (2507 samples) ]
> [root@...e ~]#
> [root@...e ~]# perf report | grep -v '^#' | head -5
>      17.49%  19.19%  ThreadPoolForeg  chromium-browser              [.] v8::internal::ConcurrentMarking::Run
>      12.17%  17.04%  ThreadPoolForeg  chromium-browser              [.] v8::internal::Sweeper::RawSweep
>      11.14%  11.63%  ThreadPoolForeg  chromium-browser              [.] v8::internal::MarkingVisitorBase<v8::internal::ConcurrentMarkingVisitor, v8::internal::ConcurrentMarkingState>::ProcessStrongHeapObject<v8::internal::CompressedHeapObjectSlot>
>       7.65%   7.84%  ThreadPoolForeg  chromium-browser              [.] v8::internal::ConcurrentMarkingVisitor::ShouldVisit
>       5.66%   6.72%  ThreadPoolForeg  chromium-browser              [.] v8::internal::ConcurrentMarkingVisitor::VisitPointersInSnapshot
> 
> [root@...e ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run
> Samples: 444  of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
> v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
> Percent
>                               
>                               
>                       Disassembly of section .text:
>                               
>                       0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
>                       v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
>                         push       %rbp
>                         mov        %rsp,%rbp
>                         push       %r15
>                         push       %r14
>                         mov        %rdi,%r14
>                         push       %r13
>                         mov        %edx,%r13d
>                         push       %r12
>                         mov        %ecx,%r12d
>                         push       %rbx
>                         sub        $0x1298,%rsp
>                         mov        %rsi,-0x1228(%rbp)
>                         mov        %fs:0x28,%rax
>                         mov        %rax,-0x38(%rbp)
> <SNIP>		
>                         movzwl     0x2(%rbx),%eax
>                         test       %ax,%ax
>                       ↓ jne        4a9
>                         mov        -0x10e8(%rbp),%rdx
>                         cmpw       $0x0,0x2(%rdx)
>    0.41   0.39        ↓ je         4b90
>                         movq       %rbx,%xmm0
>                         movq       %rdx,%xmm2
>                         mov        %rdx,%rbx
>                         punpcklqdq %xmm2,%xmm0
>                         movups     %xmm0,-0x10e8(%rbp)
>                         movzwl     0x2(%rdx),%eax
>                  4a9:   sub        $0x1,%eax
>                         mov        %ax,0x2(%rbx)
>    0.36   0.91          movzwl     %ax,%eax
>    0.60   0.00          mov        0x10(%rbx,%rax,8),%rax
>    3.44   2.46          mov        %rax,-0x11e0(%rbp)
>    0.00   0.34   4bf:   mov        0x8(%r13),%rax
>    0.00   0.36          add        $0x1,%r15d
>    0.00   0.34          mov        0x110(%rax),%rax
>                         mov        0x128(%rax),%rcx
>    0.88   0.36          mov        0x8(%r13),%rax
>                         mov        0x110(%rax),%rdx
>                         mov        0x130(%rdx),%rdx
>    0.00   0.48          mov        0x140(%rax),%rax
>                         mov        0x110(%rax),%rsi
>    0.61   0.47          mov        -0x11e0(%rbp),%rax
>    2.01   2.32          sub        $0x1,%rax
>                         cmp        %rcx,%rax
>    0.00   0.35          setae      %cl
>    1.31   0.33          cmp        %rdx,%rax
>                         setb       %dl
>    0.00   0.24          test       %dl,%cl
>    0.00   0.12        ↓ jne        4b70
>                         cmp        %rsi,%rax
>                       ↓ je         4b70
>                         mov        (%rax),%eax
>   29.10  29.90          add        -0x1220(%rbp),%rax
>                         cmpb       $0x0,-0x1218(%rbp)
>                         mov        %rax,-0x1210(%rbp)
>    0.00   0.65        ↓ jne        4fa0
>                         mov        -0x11e0(%rbp),%r9
>                         lea        0x6(%rax),%rbx
>    0.38   0.00   545:   movzbl     (%rbx),%eax
>    4.90   5.34          cmp        $0x4c,%al
>                       ↓ ja         5026
>    0.58   0.00   550:   lea        v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
>                         movslq     (%rdi,%rax,4),%rax
>    4.97   3.19          add        %rdi,%rax
> <SNIP>
> 
> If I ask for number of samples:
> 
> [root@...e ~]# perf config annotate.show_nr_samples=true
> [root@...e ~]# perf annotate --stdio2 v8::internal::ConcurrentMarking::Run
> 
> Samples: 444  of events 'anon group { branch-misses, branch-instructions }', 4000 Hz, Event count (approx.): 596221, [percent: local period]
> v8::internal::ConcurrentMarking::Run() /usr/lib64/chromium-browser/chromium-browser
> Samples
>                               
>                               
>                       Disassembly of section .text:
>                               
>                       0000000003290b30 <v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool)>:
>                       v8::internal::ConcurrentMarking::Run(v8::JobDelegate*, unsigned int, bool):
>                         push       %rbp
>                         mov        %rsp,%rbp
>                         push       %r15
>                         push       %r14
>                         mov        %rdi,%r14
>                         push       %r13
>                         mov        %edx,%r13d
>                         push       %r12
>                         mov        %ecx,%r12d
>                         push       %rbx
>                         sub        $0x1298,%rsp
>                         mov        %rsi,-0x1228(%rbp)
>                         mov        %fs:0x28,%rax
>                         mov        %rax,-0x38(%rbp)
> <SNIP>
>                         movzwl     0x2(%rbx),%eax
>                         test       %ax,%ax
>                       ↓ jne        4a9
>                         mov        -0x10e8(%rbp),%rdx
>                         cmpw       $0x0,0x2(%rdx)
>       1      1        ↓ je         4b90
>                         movq       %rbx,%xmm0
>                         movq       %rdx,%xmm2
>                         mov        %rdx,%rbx
>                         punpcklqdq %xmm2,%xmm0
>                         movups     %xmm0,-0x10e8(%rbp)
>                         movzwl     0x2(%rdx),%eax
>                  4a9:   sub        $0x1,%eax
>                         mov        %ax,0x2(%rbx)
>       1      2          movzwl     %ax,%eax
>       1      0          mov        0x10(%rbx,%rax,8),%rax
>       8      5          mov        %rax,-0x11e0(%rbp)
>       0      1   4bf:   mov        0x8(%r13),%rax
>       0      1          add        $0x1,%r15d
>       0      1          mov        0x110(%rax),%rax
>                         mov        0x128(%rax),%rcx
>       3      1          mov        0x8(%r13),%rax
>                         mov        0x110(%rax),%rdx
>                         mov        0x130(%rdx),%rdx
>       0      1          mov        0x140(%rax),%rax
>                         mov        0x110(%rax),%rsi
>       2      1          mov        -0x11e0(%rbp),%rax
>       6      6          sub        $0x1,%rax
>                         cmp        %rcx,%rax
>       0      1          setae      %cl
>       2      1          cmp        %rdx,%rax
>                         setb       %dl
>       0      1          test       %dl,%cl
>       0      1        ↓ jne        4b70
>                         cmp        %rsi,%rax
>                       ↓ je         4b70
>                         mov        (%rax),%eax
>      58     73          add        -0x1220(%rbp),%rax
>                         cmpb       $0x0,-0x1218(%rbp)
>                         mov        %rax,-0x1210(%rbp)
>       0      1        ↓ jne        4fa0
>                         mov        -0x11e0(%rbp),%r9
>                         lea        0x6(%rax),%rbx
>       1      0   545:   movzbl     (%rbx),%eax
>      10     13          cmp        $0x4c,%al
>                       ↓ ja         5026
>       1      0   550:   lea        v8::internal::FLAGDEFAULT_abort_on_contradictory_flags+0x457,%rdi
>                         movslq     (%rdi,%rax,4),%rax
>       9      8          add        %rdi,%rax
> 
> <SNIP>
> 
> So it seems to be working, what am I missing? Is this strictly non
> group related?
> 
Yes, it is non group related.
This problem occurs only when different events need to be recorded at 
the same time, i.e.:
perf record -e branch-misses -e branch-instructions -a sleep 1

The output results of perf script and perf annotate do not match.
Some events are not output in perf annotate.

> - Arnaldo
> .
> 

Powered by blists - more mailing lists