[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181121004450.GB18977@shao2-debian>
Date: Wed, 21 Nov 2018 08:44:50 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: NeilBrown <neilb@...e.com>
Cc: Jeff Layton <jlayton@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Jeff Layton <jlayton@...hat.com>, lkp@...org
Subject: [LKP] [fs/locks] 3c19f2312f: will-it-scale.per_thread_ops -65.2%
regression
Greeting,
FYI, we noticed a -65.2% regression of will-it-scale.per_thread_ops due to commit:
commit: 3c19f2312f48a3d36a4e13f5072a6a95e755b3d5 ("fs/locks: always delete_block after waiting.")
https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git locks-4.21
in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
with following parameters:
nr_task: 100%
mode: thread
test: lock1
ucode: 0xb00002e
cpufreq_governor: performance
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-bdw-ep3b/lock1/will-it-scale/0xb00002e
commit:
816f2fb5a2 ("fs/locks: allow a lock request to block other requests.")
3c19f2312f ("fs/locks: always delete_block after waiting.")
816f2fb5a2fc678c 3c19f2312f48a3d36a4e13f507
---------------- --------------------------
%stddev %change %stddev
\ | \
71447 -65.2% 24854 will-it-scale.per_thread_ops
138940 -2.9% 134886 will-it-scale.time.involuntary_context_switches
279.85 -64.2% 100.29 will-it-scale.time.user_time
6287454 -65.2% 2187242 will-it-scale.workload
1.09 -0.7 0.42 mpstat.cpu.usr%
371230 ± 4% +9.5% 406403 softirqs.SCHED
1803 ± 16% +48.9% 2685 ± 8% numa-meminfo.node0.PageTables
2784 ± 10% -30.7% 1928 ± 12% numa-meminfo.node1.PageTables
224.55 -1.8% 220.57 turbostat.PkgWatt
7.70 -3.0% 7.47 turbostat.RAMWatt
450.50 ± 17% +49.0% 671.25 ± 8% numa-vmstat.node0.nr_page_table_pages
644147 ± 10% -19.8% 516646 ± 11% numa-vmstat.node0.numa_hit
639812 ± 10% -20.6% 508027 ± 12% numa-vmstat.node0.numa_local
696.25 ± 10% -30.7% 482.50 ± 12% numa-vmstat.node1.nr_page_table_pages
4617 +2.1% 4715 proc-vmstat.nr_inactive_anon
7097 +2.0% 7241 proc-vmstat.nr_mapped
20507 +7.0% 21934 ± 3% proc-vmstat.nr_shmem
4617 +2.1% 4715 proc-vmstat.nr_zone_inactive_anon
690109 +1.0% 696863 proc-vmstat.numa_hit
672911 +1.0% 679694 proc-vmstat.numa_local
23133 ± 2% +8.9% 25196 ± 4% proc-vmstat.pgactivate
607.03 ± 6% -16.0% 509.80 ± 12% sched_debug.cfs_rq:/.util_est_enqueued.avg
24.42 ± 28% +38.2% 33.75 ± 22% sched_debug.cpu.cpu_load[2].max
2.20 ± 28% +39.9% 3.08 ± 23% sched_debug.cpu.cpu_load[2].stddev
25.33 ± 12% +23.2% 31.21 ± 9% sched_debug.cpu.cpu_load[3].max
2.28 ± 21% +29.6% 2.95 ± 12% sched_debug.cpu.cpu_load[3].stddev
52140 ± 23% +37.1% 71510 ± 3% sched_debug.cpu.nr_switches.max
53379 ± 24% +46.4% 78158 ± 11% sched_debug.cpu.sched_count.max
7132 ± 12% +32.3% 9436 ± 15% sched_debug.cpu.sched_count.stddev
4.587e+12 -7.5% 4.245e+12 perf-stat.branch-instructions
0.24 -0.1 0.15 perf-stat.branch-miss-rate%
1.107e+10 -43.0% 6.312e+09 perf-stat.branch-misses
40.04 -2.0 38.01 perf-stat.cache-miss-rate%
8.415e+09 ± 2% -19.4% 6.782e+09 ± 6% perf-stat.cache-misses
2.101e+10 -15.1% 1.783e+10 ± 5% perf-stat.cache-references
3.85 +10.7% 4.26 perf-stat.cpi
0.00 ± 2% +0.0 0.00 ± 4% perf-stat.dTLB-load-miss-rate%
90399109 ± 2% +6.6% 96381582 ± 4% perf-stat.dTLB-load-misses
4.956e+12 -11.6% 4.38e+12 perf-stat.dTLB-loads
0.00 ± 8% +0.0 0.01 ± 24% perf-stat.dTLB-store-miss-rate%
8.789e+11 -61.0% 3.427e+11 perf-stat.dTLB-stores
80.76 -10.8 69.98 perf-stat.iTLB-load-miss-rate%
3.901e+09 -63.3% 1.43e+09 perf-stat.iTLB-load-misses
9.3e+08 ± 6% -34.0% 6.135e+08 ± 2% perf-stat.iTLB-loads
1.912e+13 -9.6% 1.728e+13 perf-stat.instructions
4901 +146.5% 12081 perf-stat.instructions-per-iTLB-miss
0.26 -9.6% 0.23 perf-stat.ipc
82.36 -3.7 78.70 perf-stat.node-store-miss-rate%
2.319e+09 -20.6% 1.842e+09 perf-stat.node-store-misses
3041599 +159.7% 7898884 perf-stat.path-length
61.02 ± 10% -61.0 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
60.64 ± 10% -60.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.fcntl_setlk.do_fcntl.__x64_sys_fcntl
98.79 +0.7 99.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
98.76 +0.7 99.49 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
98.64 +0.8 99.44 perf-profile.calltrace.cycles-pp.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
97.70 +1.5 99.16 perf-profile.calltrace.cycles-pp.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
97.41 +1.6 99.05 perf-profile.calltrace.cycles-pp.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64.entry_SYSCALL_64_after_hwframe
35.73 ± 18% +62.7 98.45 perf-profile.calltrace.cycles-pp.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl.do_syscall_64
0.00 +65.1 65.07 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.locks_delete_block.do_lock_file_wait.fcntl_setlk
0.00 +65.3 65.28 perf-profile.calltrace.cycles-pp._raw_spin_lock.locks_delete_block.do_lock_file_wait.fcntl_setlk.do_fcntl
0.00 +65.3 65.31 perf-profile.calltrace.cycles-pp.locks_delete_block.do_lock_file_wait.fcntl_setlk.do_fcntl.__x64_sys_fcntl
1.13 ± 2% -0.7 0.43 perf-profile.children.cycles-pp.locks_alloc_lock
0.98 ± 2% -0.6 0.38 perf-profile.children.cycles-pp.kmem_cache_alloc
0.59 -0.4 0.23 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.53 -0.3 0.20 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.35 ± 11% -0.3 0.06 ± 9% perf-profile.children.cycles-pp.fput
0.41 ± 2% -0.3 0.15 ± 3% perf-profile.children.cycles-pp.file_has_perm
0.33 ± 2% -0.2 0.12 ± 6% perf-profile.children.cycles-pp.memset_erms
0.30 ± 3% -0.2 0.11 ± 3% perf-profile.children.cycles-pp.security_file_lock
0.25 ± 3% -0.2 0.10 ± 5% perf-profile.children.cycles-pp.security_file_fcntl
0.24 ± 2% -0.1 0.10 ± 4% perf-profile.children.cycles-pp._copy_from_user
0.22 ± 12% -0.1 0.07 ± 5% perf-profile.children.cycles-pp.__fget_light
0.21 ± 3% -0.1 0.08 ± 6% perf-profile.children.cycles-pp.avc_has_perm
0.20 ± 5% -0.1 0.08 perf-profile.children.cycles-pp.___might_sleep
0.16 ± 5% -0.1 0.06 perf-profile.children.cycles-pp.__fget
0.24 ± 3% -0.1 0.17 ± 2% perf-profile.children.cycles-pp.kmem_cache_free
0.12 ± 5% -0.1 0.05 perf-profile.children.cycles-pp.__might_sleep
0.24 ± 15% -0.1 0.18 perf-profile.children.cycles-pp.locks_insert_lock_ctx
0.11 ± 3% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.locks_free_lock
98.83 +0.7 99.54 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
98.79 +0.7 99.52 perf-profile.children.cycles-pp.do_syscall_64
98.65 +0.8 99.44 perf-profile.children.cycles-pp.__x64_sys_fcntl
97.71 +1.5 99.17 perf-profile.children.cycles-pp.do_fcntl
97.42 +1.6 99.05 perf-profile.children.cycles-pp.fcntl_setlk
94.97 +3.0 97.98 perf-profile.children.cycles-pp._raw_spin_lock
93.97 +3.3 97.24 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
35.74 ± 18% +62.7 98.46 perf-profile.children.cycles-pp.do_lock_file_wait
0.00 +65.3 65.31 perf-profile.children.cycles-pp.locks_delete_block
0.59 -0.4 0.23 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.53 -0.3 0.20 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.35 ± 10% -0.3 0.06 ± 9% perf-profile.self.cycles-pp.fput
1.00 ± 2% -0.3 0.74 perf-profile.self.cycles-pp._raw_spin_lock
0.38 ± 3% -0.2 0.14 perf-profile.self.cycles-pp.kmem_cache_alloc
0.32 ± 2% -0.2 0.12 ± 3% perf-profile.self.cycles-pp.memset_erms
0.20 ± 3% -0.1 0.08 ± 6% perf-profile.self.cycles-pp.avc_has_perm
0.20 ± 2% -0.1 0.08 ± 6% perf-profile.self.cycles-pp.posix_lock_inode
0.20 ± 6% -0.1 0.08 perf-profile.self.cycles-pp.___might_sleep
0.16 ± 4% -0.1 0.05 ± 9% perf-profile.self.cycles-pp.__fget
0.15 ± 8% -0.1 0.06 perf-profile.self.cycles-pp.fcntl_setlk
0.24 -0.1 0.15 ± 2% perf-profile.self.cycles-pp.kmem_cache_free
0.11 -0.1 0.03 ±100% perf-profile.self.cycles-pp.__x64_sys_fcntl
0.11 ± 4% -0.1 0.03 ±100% perf-profile.self.cycles-pp.__might_sleep
0.13 -0.1 0.05 perf-profile.self.cycles-pp.locks_alloc_lock
0.13 ± 5% -0.1 0.05 perf-profile.self.cycles-pp.file_has_perm
0.07 ± 7% -0.0 0.05 perf-profile.self.cycles-pp.locks_free_lock
93.64 +3.3 96.89 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
will-it-scale.per_thread_ops
75000 +-+-----------------------------------------------------------------+
70000 +-+.. .+.+.+..+.+.+.+.. .+. .+.. .+.+. .+. .+.+..+.+.+.+..+.|
| + +. .+.+. + + +. + |
65000 +-+ + |
60000 +-+ |
55000 +-+ |
50000 +-+ |
| |
45000 +-+ |
40000 +-+ |
35000 +-+ |
30000 +-+ |
| |
25000 O-O O O O O O O O O O O O O O O O O O O O O O |
20000 +-+-----------------------------------------------------------------+
will-it-scale.workload
6.5e+06 +-+---------------------------------------------------------------+
|.+. .+.+.+.+ + +. .+. .+. .+.+. .+. .+.+ +.+..+.+ |
6e+06 +-+ +. +.+..+ + +. + +. |
5.5e+06 +-+ |
| |
5e+06 +-+ |
4.5e+06 +-+ |
| |
4e+06 +-+ |
3.5e+06 +-+ |
| |
3e+06 +-+ |
2.5e+06 +-+ |
O O O O O O O O O O O O O O O O O O O O O |
2e+06 +-+--------O-O----------------------------------------------------+
will-it-scale.time.user_time
300 +-+-------------------------------------------------------------------+
280 +-+ .+.. .+. .+.. .+. .+.|
|.+..+.+.+..+.+ + +.. .+.+.+..+.+.+..+.+.+..+.+ + +. |
260 +-+ +.+.+. |
240 +-+ |
220 +-+ |
200 +-+ |
| |
180 +-+ |
160 +-+ |
140 +-+ O |
120 +-+ |
O O O O O |
100 +-+ O O O O O O O O O O O O O O O O O |
80 +-+-------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-4.20.0-rc2-00008-g3c19f23" of type "text/plain" (168529 bytes)
View attachment "job-script" of type "text/plain" (7182 bytes)
View attachment "job.yaml" of type "text/plain" (4804 bytes)
View attachment "reproduce" of type "text/plain" (308 bytes)
Powered by blists - more mailing lists