[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161107025038.GE21529@yexl-desktop>
Date: Mon, 7 Nov 2016 10:50:38 +0800
From: kernel test robot <xiaolong.ye@...el.com>
To: Borislav Petkov <bp@...e.de>
Cc: Ingo Molnar <mingo@...nel.org>, Andy Lutomirski <luto@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>, tipbuild@...or.com,
lkp@...org
Subject: [lkp] [x86/copy_user] adb402cd14: will-it-scale.per_process_ops
-12.7% regression
Greeting,
FYI, we noticed a -12.7% regression of will-it-scale.per_process_ops due to commit:
commit adb402cd1461eef6e1a21db4532a3b9e6a6be853 ("x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants")
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/asm
in testcase: will-it-scale
on test machine: 8 threads Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz with 4G memory
with following parameters:
test: poll1
cpufreq_governor: performance
Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------------------------------+
| testcase: change | aim7: aim7.jobs-per-min -11.6% regression |
| test machine | qemu-system-x86_64 -enable-kvm -cpu host -smp 4 -m 5G |
| test parameters | load=200 |
| | test=dir_rtns_1 |
+------------------+---------------------------------------------------------------------+
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/xps/poll1/will-it-scale
commit:
05b93c19d5 ("Merge branch 'linus' into x86/asm, to pick up fixes")
adb402cd14 ("x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants")
05b93c19d50af2bd adb402cd1461eef6e1a21db453
---------------- --------------------------
%stddev %change %stddev
\ | \
6754528 ± 0% -12.7% 5899566 ± 2% will-it-scale.per_process_ops
5574363 ± 0% -8.1% 5121156 ± 0% will-it-scale.per_thread_ops
577.71 ± 0% +1.9% 588.63 ± 0% will-it-scale.time.system_time
143.21 ± 0% -7.5% 132.53 ± 0% will-it-scale.time.user_time
2984 ± 3% +12.5% 3357 ± 7% cpuidle.C1E-NHM.usage
11.55 ± 4% +6.8% 12.34 ± 7% turbostat.CPU%c3
7.704e+11 ± 0% +3.2% 7.95e+11 ± 0% perf-stat.branch-instructions
0.20 ± 3% +16.3% 0.23 ± 3% perf-stat.branch-miss-rate%
1.527e+09 ± 2% +20.0% 1.833e+09 ± 3% perf-stat.branch-misses
4.543e+08 ± 3% +7.0% 4.862e+08 ± 3% perf-stat.cache-references
1.377e+12 ± 0% +5.6% 1.454e+12 ± 0% perf-stat.dTLB-loads
0.00 ± 2% -3.9% 0.00 ± 1% perf-stat.dTLB-store-miss-rate%
8.151e+11 ± 0% +5.4% 8.591e+11 ± 0% perf-stat.dTLB-stores
0.00 ± 11% -15.1% 0.00 ± 4% perf-stat.iTLB-load-miss-rate%
3.852e+12 ± 0% +5.6% 4.067e+12 ± 0% perf-stat.iTLB-loads
3.373e+12 ± 1% +5.0% 3.542e+12 ± 1% perf-stat.instructions
0.80 ± 0% +5.4% 0.84 ± 0% perf-stat.ipc
297784 ± 0% -1.1% 294651 ± 0% perf-stat.minor-faults
297784 ± 0% -1.1% 294651 ± 0% perf-stat.page-faults
230146 ± 5% +33.3% 306713 ± 21% sched_debug.cfs_rq:/.load.max
23374 ± 13% -22.2% 18188 ± 8% sched_debug.cpu.nr_load_updates.stddev
5268307 ± 41% -52.4% 2508626 ± 40% sched_debug.cpu.nr_switches.max
1749053 ± 40% -45.8% 947909 ± 29% sched_debug.cpu.nr_switches.stddev
14.50 ± 42% -47.4% 7.62 ± 30% sched_debug.cpu.nr_uninterruptible.max
10.42 ± 24% -42.1% 6.03 ± 15% sched_debug.cpu.nr_uninterruptible.stddev
5265639 ± 41% -52.4% 2505468 ± 40% sched_debug.cpu.sched_count.max
1748858 ± 40% -45.8% 947432 ± 29% sched_debug.cpu.sched_count.stddev
2627322 ± 41% -52.5% 1247788 ± 40% sched_debug.cpu.sched_goidle.max
2636214 ± 41% -52.5% 1251571 ± 40% sched_debug.cpu.ttwu_count.max
875433 ± 40% -45.8% 474432 ± 29% sched_debug.cpu.ttwu_count.stddev
2629058 ± 41% -52.6% 1245111 ± 40% sched_debug.cpu.ttwu_local.max
876396 ± 40% -45.8% 474742 ± 29% sched_debug.cpu.ttwu_local.stddev
0.00 ± -1% +Inf% 2.95 ± 11% perf-profile.calltrace.cycles-pp.___might_sleep.__might_sleep.__might_fault._copy_from_user.do_sys_poll
0.00 ± -1% +Inf% 4.78 ± 12% perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
0.00 ± -1% +Inf% 4.04 ± 11% perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_from_user.do_sys_poll.sys_poll
0.00 ± -1% +Inf% 8.23 ± 14% perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.calltrace.cycles-pp.call_cpuidle.cpu_startup_entry.rest_init.start_kernel.x86_64_start_reservations
0.00 ± -1% +Inf% 15.23 ± 13% perf-profile.calltrace.cycles-pp.copy_user_generic_string.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
20.07 ± 14% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.copy_user_generic_string.sys_poll.entry_SYSCALL_64_fastpath
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.calltrace.cycles-pp.cpu_startup_entry.rest_init.start_kernel.x86_64_start_reservations.x86_64_start_kernel
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.calltrace.cycles-pp.cpuidle_enter.call_cpuidle.cpu_startup_entry.rest_init.start_kernel
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry.rest_init
22.59 ± 13% +88.1% 42.49 ± 11% perf-profile.calltrace.cycles-pp.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
7.62 ± 9% -18.9% 6.18 ± 10% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_swapgs
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.calltrace.cycles-pp.rest_init.start_kernel.x86_64_start_reservations.x86_64_start_kernel.start_cpu
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.calltrace.cycles-pp.start_kernel.x86_64_start_reservations.x86_64_start_kernel.start_cpu
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.calltrace.cycles-pp.x86_64_start_kernel.start_cpu
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.calltrace.cycles-pp.x86_64_start_reservations.x86_64_start_kernel.start_cpu
1.81 ± 18% +174.0% 4.96 ± 12% perf-profile.children.cycles-pp.___might_sleep
3.96 ± 17% +139.4% 9.49 ± 13% perf-profile.children.cycles-pp.__might_fault
2.90 ± 15% +146.7% 7.16 ± 12% perf-profile.children.cycles-pp.__might_sleep
0.42 ± 26% +1964.5% 8.57 ± 13% perf-profile.children.cycles-pp._copy_from_user
22.75 ± 13% +87.6% 42.66 ± 11% perf-profile.children.cycles-pp.do_sys_poll
7.62 ± 9% -18.9% 6.18 ± 10% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_swapgs
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.children.cycles-pp.rest_init
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.children.cycles-pp.start_kernel
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.children.cycles-pp.x86_64_start_kernel
5.91 ± 21% +42.3% 8.41 ± 18% perf-profile.children.cycles-pp.x86_64_start_reservations
1.80 ± 18% +174.2% 4.95 ± 12% perf-profile.self.cycles-pp.___might_sleep
1.08 ± 19% +129.3% 2.48 ± 15% perf-profile.self.cycles-pp.__might_fault
1.54 ± 12% +91.9% 2.96 ± 13% perf-profile.self.cycles-pp.__might_sleep
0.42 ± 26% +681.3% 3.24 ± 15% perf-profile.self.cycles-pp._copy_from_user
14.39 ± 14% -26.1% 10.64 ± 9% perf-profile.self.cycles-pp.do_sys_poll
7.62 ± 9% -18.9% 6.18 ± 10% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_swapgs
will-it-scale.time.user_time
148 ++--------------------------------------------------------------------+
146 ++ .*.*.*..* .*. .*.*.*.*.. |
*.* : .*. .*. .* * * |
144 ++ : .*.* *.*..*.* *.*.*.*.*..* + .*.*.|
142 ++ *.* * *
| |
140 ++ |
138 ++ |
136 ++ |
| |
134 ++ O O O O O O |
132 O+ O O O O O O |
| |
130 ++O O O |
128 ++--O-----------------------------------------------------------------+
will-it-scale.time.system_time
594 ++--------------------------------------------------------------------+
592 ++O O O |
| |
590 O+ O O O O |
588 ++ O O O O O O O |
586 ++ O O |
584 ++ |
| |
582 ++ |
580 ++ *. |
578 ++ : *. *.*. .*. .*.*
576 *+ : *.*.*.*.*..*.*. + *.*.*..*. .*. * * |
| *.*.*.*.. : * *.* *.*.*. .. |
574 ++ * * |
572 ++--------------------------------------------------------------------+
will-it-scale.per_process_ops
7.2e+06 ++----------------------------------------------------------------+
| * |
7e+06 ++ : : |
6.8e+06 ++*.*.*. : : .*.*.*.*.*.*.*.*.*.*.*.*.*.*. .*.*. .*. .*.*. .*.*
|+ * *.*.* * * * * |
6.6e+06 *+ |
6.4e+06 ++ |
| |
6.2e+06 O+ O O O O O |
6e+06 ++ O O O O O O |
| O O |
5.8e+06 ++ |
5.6e+06 ++O O |
| O |
5.4e+06 ++----------------------------------------------------------------+
will-it-scale.per_thread_ops
5.9e+06 ++----------------------------------------------------------------+
5.8e+06 ++ * |
| + : |
5.7e+06 *+*.*.*.* : .*.*.*.*. .*.*.*.*.*.*.*.*.*.*.*.*.*. .*. |
5.6e+06 ++ *.*.* *.* * * *.*
| + + |
5.5e+06 ++ * |
5.4e+06 ++ |
5.3e+06 ++ |
| |
5.2e+06 O+ O O O O O O |
5.1e+06 ++O O O O O O O O |
| |
5e+06 ++ O |
4.9e+06 ++--O-------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
***************************************************************************************************
vm-lkp-a03: qemu-system-x86_64 -enable-kvm -cpu host -smp 4 -m 5G
=========================================================================================
compiler/kconfig/load/rootfs/tbox_group/test/testcase:
gcc-6/x86_64-rhel-7.2/200/debian-x86_64-2016-08-31.cgz/vm-lkp-a03/dir_rtns_1/aim7
commit:
05b93c19d5 ("Merge branch 'linus' into x86/asm, to pick up fixes")
adb402cd14 ("x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants")
05b93c19d50af2bd adb402cd1461eef6e1a21db453
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
%stddev %change %stddev
\ | \
25422 ± 0% -11.6% 22469 ± 0% aim7.jobs-per-min
49.16 ± 1% +13.7% 55.90 ± 1% aim7.time.elapsed_time
49.16 ± 1% +13.7% 55.90 ± 1% aim7.time.elapsed_time.max
56074 ± 1% +18.2% 66257 ± 0% aim7.time.involuntary_context_switches
2544 ± 0% +0.9% 2566 ± 0% aim7.time.maximum_resident_set_size
102.58 ± 0% +22.4% 125.56 ± 0% aim7.time.system_time
109578 ± 12% -38.9% 66934 ± 46% sched_debug.cfs_rq:/.load.stddev
29520 ± 4% +15.4% 34074 ± 5% proc-vmstat.numa_hit
29495 ± 4% +15.4% 34047 ± 5% proc-vmstat.numa_local
43435 ± 4% +12.3% 48756 ± 5% proc-vmstat.pgalloc_normal
31786 ± 4% +15.3% 36656 ± 5% proc-vmstat.pgfault
35307 ± 11% +26.0% 44486 ± 10% proc-vmstat.pgfree
0.00 ± -1% +Inf% 168236 ±138% latency_stats.avg.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64
0.00 ± -1% +Inf% 19256 ±109% latency_stats.avg.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.exit_mmap.mmput.flush_old_exec.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64
91562 ± 74% +638.3% 675991 ±107% latency_stats.avg.max
69313 ±121% +661.5% 527827 ±150% latency_stats.avg.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
0.00 ± -1% +Inf% 175494 ±131% latency_stats.max.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64
0.00 ± -1% +Inf% 19825 ±107% latency_stats.max.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.exit_mmap.mmput.flush_old_exec.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64
240734 ± 29% +217.3% 763970 ± 86% latency_stats.max.max
69313 ±121% +661.5% 527827 ±150% latency_stats.max.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
0.00 ± -1% +Inf% 178824 ±127% latency_stats.sum.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64
0.00 ± -1% +Inf% 25864 ±100% latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.exit_mmap.mmput.flush_old_exec.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64
69313 ±121% +661.5% 527827 ±150% latency_stats.sum.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
Thanks,
Xiaolong
View attachment "config-4.9.0-rc3-00275-gadb402c" of type "text/plain" (153603 bytes)
View attachment "job-script" of type "text/plain" (6535 bytes)
View attachment "job.yaml" of type "text/plain" (4212 bytes)
View attachment "reproduce" of type "text/plain" (138 bytes)
Powered by blists - more mailing lists