lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 25 Mar 2016 20:36:17 +0800
From:	"Wangnan (F)" <wangnan0@...wei.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	<mingo@...hat.com>, <linux-kernel@...r.kernel.org>,
	He Kuang <hekuang@...wei.com>,
	Alexei Starovoitov <ast@...nel.org>,
	"Arnaldo Carvalho de Melo" <acme@...hat.com>,
	Brendan Gregg <brendan.d.gregg@...il.com>,
	"Jiri Olsa" <jolsa@...nel.org>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Namhyung Kim <namhyung@...nel.org>,
	Zefan Li <lizefan@...wei.com>, <pi3orama@....com>
Subject: Re: [PATCH 3/5] perf core: Prepare writing into ring buffer from end



On 2016/3/25 20:26, Wangnan (F) wrote:
>
>
> On 2016/3/23 17:50, Peter Zijlstra wrote:
>> On Mon, Mar 14, 2016 at 09:59:43AM +0000, Wang Nan wrote:
>>> Convert perf_output_begin to __perf_output_begin and make the later
>>> function able to write records from the end of the ring buffer.
>>> Following commits will utilize the 'backward' flag.
>>>
>>> This patch doesn't introduce any extra performance overhead since we
>>> use always_inline.
>> So while I agree that with __always_inline and constant propagation we
>> _should_ end up with the same code, we have:
>>
>> $ size defconfig-build/kernel/events/ring_buffer.o.{pre,post}
>>     text    data     bss     dec     hex filename
>>     3785       2       0    3787     ecb 
>> defconfig-build/kernel/events/ring_buffer.o.pre
>>     3673       2       0    3675     e5b 
>> defconfig-build/kernel/events/ring_buffer.o.post
>>
>> The patch actually makes the file shrink.
>>
>> So I think we still want to have some actual performance numbers.
>
> In my environment the two objects are nearly idential:
>
>
> $ objdump -d kernel/events/ring_buffer.o.new  > ./out.new.S
> $ objdump -d kernel/events/ring_buffer.o.old  > ./out.old.S
>
> --- ./out.old.S    2016-03-25 12:18:52.060656423 +0000
> +++ ./out.new.S    2016-03-25 12:18:45.376630269 +0000
> @@ -1,5 +1,5 @@
>
> -kernel/events/ring_buffer.o.old:     file format elf64-x86-64
> +kernel/events/ring_buffer.o.new:     file format elf64-x86-64
>
>
>  Disassembly of section .text:
> @@ -320,7 +320,7 @@
>   402:    4d 8d 04 0f              lea    (%r15,%rcx,1),%r8
>   406:    48 89 c8                 mov    %rcx,%rax
>   409:    4c 0f b1 43 40           cmpxchg %r8,0x40(%rbx)
> - 40e:    48 39 c8                 cmp    %rcx,%rax
> + 40e:    48 39 c1                 cmp    %rax,%rcx
>   411:    75 b4                    jne    3c7 <perf_output_begin+0xc7>
>   413:    48 8b 73 58              mov    0x58(%rbx),%rsi
>   417:    48 8b 43 68              mov    0x68(%rbx),%rax
> @@ -357,7 +357,7 @@
>   480:    85 c0                    test   %eax,%eax
>   482:    0f 85 02 ff ff ff        jne    38a <perf_output_begin+0x8a>
>   488:    48 c7 c2 00 00 00 00     mov    $0x0,%rdx
> - 48f:    be 7c 00 00 00           mov    $0x7c,%esi
> + 48f:    be 89 00 00 00           mov    $0x89,%esi
>   494:    48 c7 c7 00 00 00 00     mov    $0x0,%rdi
>   49b:    c6 05 00 00 00 00 01     movb   $0x1,0x0(%rip)        # 4a2 
> <perf_output_begin+0x1a2>
>   4a2:    e8 00 00 00 00           callq  4a7 <perf_output_begin+0x1a7>
> @@ -874,7 +874,7 @@
>   c39:    eb e7                    jmp    c22 
> <perf_aux_output_begin+0x172>
>   c3b:    80 3d 00 00 00 00 00     cmpb   $0x0,0x0(%rip)        # c42 
> <perf_aux_output_begin+0x192>
>   c42:    75 93                    jne    bd7 
> <perf_aux_output_begin+0x127>
> - c44:    be 2b 01 00 00           mov    $0x12b,%esi
> + c44:    be 49 01 00 00           mov    $0x149,%esi
>   c49:    48 c7 c7 00 00 00 00     mov    $0x0,%rdi
>   c50:    e8 00 00 00 00           callq  c55 
> <perf_aux_output_begin+0x1a5>
>   c55:    c6 05 00 00 00 00 01     movb   $0x1,0x0(%rip)        # c5c 
> <perf_aux_output_begin+0x1ac>
>
>
> I think you enabled some unusual config options?
>

You must enabled CONFIG_OPTIMIZE_INLINING. Now I get similar result:

$ size kernel/events/ring_buffer.o*
    text       data        bss        dec        hex    filename
    4545          4          8       4557       11cd 
kernel/events/ring_buffer.o.new
    4641          4          8       4653       122d 
kernel/events/ring_buffer.o.old

Thank you.


Powered by blists - more mailing lists