lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161025090651.GC3175@twins.programming.kicks-ass.net>
Date:   Tue, 25 Oct 2016 11:06:51 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     kernel test robot <xiaolong.ye@...el.com>
Cc:     Jiri Olsa <jolsa@...hat.com>, Michael Neuling <mikey@...ling.org>,
        Paul Mackerras <paulus@...ba.org>,
        Jiri Olsa <jolsa@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jan Stancek <jstancek@...hat.com>, lkp@...org
Subject: Re: [lkp] [perf powerpc]  18d1796d0b: [No primary change]

On Tue, Oct 25, 2016 at 02:40:13PM +0800, kernel test robot wrote:
> [will-it-scale] perf-stat.branch-miss-rate +7.4% regression 
> Reply-To: kernel test robot <xiaolong.ye@...el.com>
> User-Agent: Heirloom mailx 12.5 6/20/10
> 
> 
> FYI, we noticed a +7.4% regression of perf-stat.branch-miss-rate due to commit:
> 
> commit 18d1796d0b45762ec6f58c5ed2ad3f7510ffbaa9 ("perf powerpc: Don't call perf_event_disable from atomic context")
> https://github.com/0day-ci/linux Jiri-Olsa/perf-powerpc-Don-t-call-perf_event_disable-from-atomic-context/20161006-203500
> 
> in testcase: will-it-scale
> on test machine: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
> with following parameters:
> 
> 	test: poll2
> 	cpufreq_governor: performance
> 
> Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.

> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> To reproduce:
> 
>         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
>         cd lkp-tests
>         bin/lkp install job.yaml  # job file is attached in this email
>         bin/lkp run     job.yaml
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
>   gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/poll2/will-it-scale
> 
> commit: 
>   41aad2a6d4 (" perf/core improvements and fixes:")
>   18d1796d0b ("perf powerpc: Don't call perf_event_disable from atomic context")
> 
> 41aad2a6d4fcdda8 18d1796d0b45762ec6f58c5ed2 
> ---------------- -------------------------- 
>        fail:runs  %reproduction    fail:runs
>            |             |             |    
>          %stddev     %change         %stddev
>              \          |                \  
>       0.19 ±  0%      +7.4%       0.21 ±  0%  perf-stat.branch-miss-rate%
>  9.591e+09 ±  1%      +9.1%  1.047e+10 ±  0%  perf-stat.branch-misses
>  1.962e+09 ±  0%      +2.3%  2.008e+09 ±  1%  perf-stat.cache-references
>      51.18 ±  2%      +5.6%      54.06 ±  1%  perf-stat.iTLB-load-miss-rate%
>   46430577 ±  5%      -6.9%   43241506 ±  2%  perf-stat.iTLB-loads
>       9.90 ±  4%      +9.3%      10.82 ±  4%  turbostat.Pkg%pc2
>      62066 ± 24%     +34.7%      83582 ± 11%  numa-meminfo.node1.Active
>      49531 ± 30%     +42.9%      70778 ± 13%  numa-meminfo.node1.Active(anon)
>      27883 ±100%    -100.0%       0.00 ± -1%  latency_stats.avg.proc_cgroup_show.proc_single_show.seq_read.__vfs_read.vfs_read.SyS_read.do_syscall_64.return_from_SYSCALL_64
>      27883 ±100%    -100.0%       0.00 ± -1%  latency_stats.max.proc_cgroup_show.proc_single_show.seq_read.__vfs_read.vfs_read.SyS_read.do_syscall_64.return_from_SYSCALL_64
>      32685 ± 38%     +88.5%      61603 ±147%  latency_stats.sum.call_rwsem_down_write_failed.path_openat.do_filp_open.do_sys_open.SyS_open.entry_SYSCALL_64_fastpath
>      27883 ±100%    -100.0%       0.00 ± -1%  latency_stats.sum.proc_cgroup_show.proc_single_show.seq_read.__vfs_read.vfs_read.SyS_read.do_syscall_64.return_from_SYSCALL_64
>      92795 ±  4%      -8.6%      84853 ±  6%  numa-vmstat.node0.numa_hit
>      92782 ±  4%      -8.5%      84851 ±  6%  numa-vmstat.node0.numa_local
>      12381 ± 30%     +42.9%      17694 ± 13%  numa-vmstat.node1.nr_active_anon
>      12381 ± 30%     +42.9%      17694 ± 13%  numa-vmstat.node1.nr_zone_active_anon
>      21.80 ± 59%     -69.8%       6.58 ± 83%  sched_debug.cpu.clock.stddev
>      21.80 ± 59%     -69.8%       6.58 ± 83%  sched_debug.cpu.clock_task.stddev
>       0.00 ± 23%     -34.3%       0.00 ± 20%  sched_debug.cpu.next_balance.stddev
>      35829 ±  9%     -18.4%      29221 ±  6%  sched_debug.cpu.nr_switches.max
>       8361 ±  6%     -13.4%       7243 ±  7%  sched_debug.cpu.nr_switches.stddev
>       8.43 ± 11%     -25.2%       6.30 ± 12%  sched_debug.cpu.nr_uninterruptible.stddev
>      18057 ±  6%     -14.3%      15482 ±  8%  sched_debug.cpu.sched_count.stddev
> 

ARGH... so what is the normal metric for this test and did that change?
And why can't I still find that? These reports suck!

The result doesn't make sense, my gcc inlines the function call, the
emitted code is very similar to the old code, with exception of one
extra symbol.

Are you sure this isn't simple run to run variation?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ