[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20171206021450.GE21779@yexl-desktop>
Date: Wed, 6 Dec 2017 10:14:50 +0800
From: kernel test robot <xiaolong.ye@...el.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Ingo Molnar <mingo@...nel.org>,
Andy Lutomirski <luto@...capital.net>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [lkp-robot] [locking/x86] 450cbdd012: will-it-scale.per_process_ops
16.1% improvement
Greeting,
FYI, we noticed a 16.1% improvement of will-it-scale.per_process_ops due to commit:
commit: 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730 ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
with following parameters:
test: futex3
cpufreq_governor: performance
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/futex3/will-it-scale
commit:
b04db8e19f ("rcu: Use lockdep to assert IRQs are disabled/enabled")
450cbdd012 ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")
b04db8e19fc2e913 450cbdd0125cfa5d7bbf9e2a6b
---------------- --------------------------
%stddev %change %stddev
\ | \
9560719 +16.1% 11095797 will-it-scale.per_process_ops
9562708 +17.1% 11197397 will-it-scale.per_thread_ops
0.64 -7.3% 0.59 will-it-scale.scalability
2019 -3.3% 1953 will-it-scale.time.system_time
490.82 +13.5% 556.90 will-it-scale.time.user_time
8.16 +1.1 9.26 mpstat.cpu.usr%
1834 ± 12% -15.1% 1558 ± 2% vmstat.system.cs
30.11 ± 3% -4.0% 28.91 boot-time.boot
883.63 ± 4% -5.2% 838.05 ± 2% boot-time.idle
1752 -60.3% 696.00 ± 87% meminfo.Mlocked
1752 -60.3% 696.00 ± 87% meminfo.Unevictable
991.50 -60.5% 391.25 ± 86% numa-meminfo.node1.Mlocked
991.50 -60.5% 391.25 ± 86% numa-meminfo.node1.Unevictable
104.91 +2.7% 107.75 turbostat.CorWatt
132.50 +2.2% 135.36 turbostat.PkgWatt
247.25 -60.5% 97.75 ± 86% numa-vmstat.node1.nr_mlock
247.25 -60.5% 97.75 ± 86% numa-vmstat.node1.nr_unevictable
247.25 -60.5% 97.75 ± 86% numa-vmstat.node1.nr_zone_unevictable
5116 ± 2% +10.7% 5664 ± 3% slabinfo.cred_jar.active_objs
5116 ± 2% +10.7% 5664 ± 3% slabinfo.cred_jar.num_objs
1827 ± 7% +19.3% 2180 ± 3% slabinfo.fsnotify_mark_connector.active_objs
1827 ± 7% +19.3% 2180 ± 3% slabinfo.fsnotify_mark_connector.num_objs
6521 ± 3% +13.5% 7399 slabinfo.kmalloc-96.active_objs
6584 ± 3% +12.8% 7424 slabinfo.kmalloc-96.num_objs
1.99e+12 +14.6% 2.281e+12 perf-stat.branch-instructions
0.01 ± 2% -0.0 0.01 ± 3% perf-stat.branch-miss-rate%
2.43e+08 +2.3% 2.486e+08 perf-stat.cache-misses
690932 ± 12% -15.2% 586079 ± 2% perf-stat.context-switches
1.03 -12.7% 0.90 perf-stat.cpi
1.63e+13 -1.1% 1.611e+13 perf-stat.cpu-cycles
3.459e+12 +13.5% 3.925e+12 perf-stat.dTLB-loads
0.00 ± 7% -0.0 0.00 ± 5% perf-stat.dTLB-store-miss-rate%
2.881e+12 ± 3% +16.3% 3.351e+12 perf-stat.dTLB-stores
2.956e+09 ± 46% +146.2% 7.279e+09 ± 27% perf-stat.iTLB-load-misses
1.584e+13 +13.2% 1.794e+13 perf-stat.instructions
8495 ± 83% -68.8% 2651 ± 26% perf-stat.instructions-per-iTLB-miss
0.97 +14.5% 1.11 perf-stat.ipc
32.91 -0.9 31.99 perf-stat.node-load-miss-rate%
31.09 -3.2 27.86 perf-stat.node-store-miss-rate%
84812551 ± 2% -13.9% 72994159 ± 2% perf-stat.node-store-misses
53.36 -8.0 45.33 perf-profile.calltrace.cycles.do_futex.sys_futex.entry_SYSCALL_64_fastpath
57.06 -7.0 50.02 perf-profile.calltrace.cycles.sys_futex.entry_SYSCALL_64_fastpath
62.70 -5.7 56.96 perf-profile.calltrace.cycles.entry_SYSCALL_64_fastpath
17.33 ± 3% -4.9 12.46 perf-profile.calltrace.cycles.hash_futex.do_futex.sys_futex.entry_SYSCALL_64_fastpath
30.58 -4.5 26.11 ± 2% perf-profile.calltrace.cycles.futex_wake.do_futex.sys_futex.entry_SYSCALL_64_fastpath
19.36 -2.5 16.89 ± 3% perf-profile.calltrace.cycles.get_futex_key.futex_wake.do_futex.sys_futex.entry_SYSCALL_64_fastpath
13.13 ± 2% -1.9 11.19 ± 3% perf-profile.calltrace.cycles.get_futex_key_refs.get_futex_key.futex_wake.do_futex.sys_futex
10.09 ± 13% -0.4 9.71 ± 14% perf-profile.calltrace.cycles.poll_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle
12.38 ± 9% -0.4 12.02 ± 11% perf-profile.calltrace.cycles.cpu_startup_entry.start_secondary.verify_cpu
12.38 ± 9% -0.4 12.02 ± 11% perf-profile.calltrace.cycles.start_secondary.verify_cpu
12.36 ± 9% -0.4 12.00 ± 11% perf-profile.calltrace.cycles.do_idle.cpu_startup_entry.start_secondary.verify_cpu
12.67 ± 9% -0.4 12.31 ± 11% perf-profile.calltrace.cycles.verify_cpu
12.11 ± 9% -0.3 11.77 ± 11% perf-profile.calltrace.cycles.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle.cpu_startup_entry
12.20 ± 9% -0.3 11.86 ± 11% perf-profile.calltrace.cycles.cpuidle_enter.call_cpuidle.do_idle.cpu_startup_entry.start_secondary
12.20 ± 9% -0.3 11.87 ± 11% perf-profile.calltrace.cycles.call_cpuidle.do_idle.cpu_startup_entry.start_secondary.verify_cpu
20.03 ± 3% +5.2 25.19 ± 2% perf-profile.calltrace.cycles.entry_SYSCALL_64
54.06 -7.8 46.21 perf-profile.children.cycles.do_futex
57.42 -6.9 50.49 perf-profile.children.cycles.sys_futex
62.78 -5.7 57.07 perf-profile.children.cycles.entry_SYSCALL_64_fastpath
17.30 -5.3 11.96 ± 4% perf-profile.children.cycles.get_futex_key_refs
18.02 ± 3% -4.7 13.32 perf-profile.children.cycles.hash_futex
30.97 -4.4 26.59 ± 2% perf-profile.children.cycles.futex_wake
19.74 -2.4 17.36 ± 3% perf-profile.children.cycles.get_futex_key
10.19 ± 14% -0.4 9.79 ± 15% perf-profile.children.cycles.poll_idle
12.65 ± 9% -0.4 12.29 ± 11% perf-profile.children.cycles.do_idle
12.38 ± 9% -0.4 12.02 ± 11% perf-profile.children.cycles.start_secondary
12.67 ± 9% -0.4 12.31 ± 11% perf-profile.children.cycles.verify_cpu
12.67 ± 9% -0.4 12.31 ± 11% perf-profile.children.cycles.cpu_startup_entry
12.39 ± 10% -0.3 12.04 ± 11% perf-profile.children.cycles.cpuidle_enter_state
12.49 ± 10% -0.3 12.15 ± 11% perf-profile.children.cycles.call_cpuidle
12.48 ± 10% -0.3 12.14 ± 11% perf-profile.children.cycles.cpuidle_enter
20.03 ± 3% +5.2 25.19 ± 2% perf-profile.children.cycles.entry_SYSCALL_64
17.13 -5.3 11.82 ± 3% perf-profile.self.cycles.get_futex_key_refs
17.94 ± 3% -4.8 13.14 perf-profile.self.cycles.hash_futex
10.16 ± 14% -0.4 9.72 ± 14% perf-profile.self.cycles.poll_idle
6.51 -0.4 6.16 perf-profile.self.cycles.get_futex_key
4.38 ± 2% +0.9 5.33 ± 2% perf-profile.self.cycles.entry_SYSCALL_64_fastpath
4.69 ± 2% +1.1 5.74 ± 2% perf-profile.self.cycles.do_futex
6.35 ± 2% +1.2 7.56 ± 2% perf-profile.self.cycles.futex_wake
20.03 ± 3% +5.2 25.19 ± 2% perf-profile.self.cycles.entry_SYSCALL_64
will-it-scale.per_process_ops
1.2e+07 +-+---------------------------------------------------------------+
O O O O O O O O O O O O O O O O O O |
1e+07 +-+ |
|.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.|
| |
8e+06 +-+ |
| |
6e+06 +-+ |
| |
4e+06 +-+ |
| |
| |
2e+06 +-+ |
| |
0 +-+----------O----------------------------------------------------+
will-it-scale.per_thread_ops
1.2e+07 +-+---------------------------------------------------------------+
O O O O O O O O O O O O O O O O O O |
1e+07 +-+ |
|.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.|
| |
8e+06 +-+ |
| |
6e+06 +-+ |
| |
4e+06 +-+ |
| |
| |
2e+06 +-+ |
| |
0 +-+----------O----------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong
View attachment "config-4.14.0-rc8-00080-g450cbdd" of type "text/plain" (163362 bytes)
View attachment "job-script" of type "text/plain" (7133 bytes)
View attachment "job.yaml" of type "text/plain" (4757 bytes)
View attachment "reproduce" of type "text/plain" (328 bytes)
Powered by blists - more mailing lists