[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160128213833.3a7b9714@synchrony.poochiereds.net>
Date: Thu, 28 Jan 2016 21:38:33 -0500
From: Jeff Layton <jeff.layton@...marydata.com>
To: kernel test robot <ying.huang@...ux.intel.com>
Cc: lkp@...org, LKML <linux-kernel@...r.kernel.org>,
"J. Bruce Fields" <bfields@...ldses.org>,
Dmitry Vyukov <dvyukov@...gle.com>,
Alexander Viro <viro@...iv.linux.org.uk>
Subject: Re: [lkp] [locks] 7f3697e24d: +35.1% will-it-scale.per_thread_ops
On Fri, 29 Jan 2016 09:32:19 +0800
kernel test robot <ying.huang@...ux.intel.com> wrote:
> FYI, we noticed the below changes on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> commit 7f3697e24dc3820b10f445a4a7d914fc356012d1 ("locks: fix unlock when fcntl_setlk races with a close")
>
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
> gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-snb01/lock1/will-it-scale
>
> commit:
> 9189922675ecca0fab38931d86b676e9d79602dc
> 7f3697e24dc3820b10f445a4a7d914fc356012d1
>
> 9189922675ecca0f 7f3697e24dc3820b10f445a4a7
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 2376432 ± 0% +2.1% 2427484 ± 0% will-it-scale.per_process_ops
> 807889 ± 0% +35.1% 1091496 ± 4% will-it-scale.per_thread_ops
> 22.08 ± 2% +89.1% 41.75 ± 5% will-it-scale.time.user_time
> 1238371 ± 14% +100.4% 2481345 ± 39% cpuidle.C1E-SNB.time
> 3098 ± 57% -66.6% 1035 ±171% numa-numastat.node1.other_node
> 379.25 ± 8% -21.4% 298.00 ± 12% numa-vmstat.node0.nr_alloc_batch
> 22.08 ± 2% +89.1% 41.75 ± 5% time.user_time
> 1795 ± 4% +7.5% 1930 ± 2% vmstat.system.cs
> 0.54 ± 5% +136.9% 1.28 ± 10% perf-profile.cycles.___might_sleep.__might_sleep.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file
> 1.65 ± 57% +245.2% 5.70 ± 29% perf-profile.cycles.__fdget_raw.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.58 ± 59% +248.3% 5.50 ± 31% perf-profile.cycles.__fget.__fget_light.__fdget_raw.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.62 ± 58% +246.3% 5.63 ± 30% perf-profile.cycles.__fget_light.__fdget_raw.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 5.88 ± 11% perf-profile.cycles.__memset.locks_alloc_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 2.50 ± 2% -100.0% 0.00 ± -1% perf-profile.cycles.__memset.locks_alloc_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk
> 1.29 ± 4% +138.8% 3.09 ± 11% perf-profile.cycles.__memset.locks_alloc_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.47 ± 9% +144.4% 1.16 ± 11% perf-profile.cycles.__might_fault.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.37 ± 12% +140.3% 0.90 ± 9% perf-profile.cycles.__might_sleep.__might_fault.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.86 ± 6% +137.7% 2.05 ± 10% perf-profile.cycles.__might_sleep.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file.vfs_lock_file
> 0.61 ± 14% +56.8% 0.95 ± 14% perf-profile.cycles.__might_sleep.kmem_cache_alloc.locks_alloc_lock.fcntl_setlk.sys_fcntl
> 0.00 ± -1% +Inf% 39.84 ± 12% perf-profile.cycles.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk.sys_fcntl
> 16.44 ± 3% -100.0% 0.00 ± -1% perf-profile.cycles.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 1.77 ± 11% perf-profile.cycles._raw_spin_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 59.34 ± 1% -72.4% 16.36 ± 33% perf-profile.cycles._raw_spin_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.46 ± 11% +144.9% 1.13 ± 19% perf-profile.cycles.avc_has_perm.inode_has_perm.file_has_perm.selinux_file_fcntl.security_file_fcntl
> 0.87 ± 6% +103.2% 1.77 ± 12% perf-profile.cycles.avc_has_perm.inode_has_perm.file_has_perm.selinux_file_lock.security_file_lock
> 0.81 ± 4% +135.7% 1.90 ± 10% perf-profile.cycles.copy_user_generic_string.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 41.86 ± 12% perf-profile.cycles.do_lock_file_wait.part.29.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.88 ± 6% +127.8% 2.00 ± 9% perf-profile.cycles.entry_SYSCALL_64
> 0.86 ± 4% +122.6% 1.92 ± 12% perf-profile.cycles.entry_SYSCALL_64_after_swapgs
> 84.98 ± 0% -9.1% 77.20 ± 2% perf-profile.cycles.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.76 ± 10% +142.1% 1.84 ± 14% perf-profile.cycles.file_has_perm.selinux_file_fcntl.security_file_fcntl.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.35 ± 4% +106.3% 2.78 ± 11% perf-profile.cycles.file_has_perm.selinux_file_lock.security_file_lock.fcntl_setlk.sys_fcntl
> 0.00 ± -1% +Inf% 0.89 ± 12% perf-profile.cycles.flock_to_posix_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 6.90 ± 4% -48.6% 3.55 ± 27% perf-profile.cycles.fput.entry_SYSCALL_64_fastpath
> 0.51 ± 10% +140.5% 1.23 ± 16% perf-profile.cycles.inode_has_perm.isra.31.file_has_perm.selinux_file_fcntl.security_file_fcntl.sys_fcntl
> 0.98 ± 4% +97.7% 1.93 ± 11% perf-profile.cycles.inode_has_perm.isra.31.file_has_perm.selinux_file_lock.security_file_lock.fcntl_setlk
> 0.00 ± -1% +Inf% 6.56 ± 10% perf-profile.cycles.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 2.75 ± 4% -100.0% 0.00 ± -1% perf-profile.cycles.kmem_cache_alloc.locks_alloc_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk
> 1.53 ± 7% +119.7% 3.37 ± 13% perf-profile.cycles.kmem_cache_alloc.locks_alloc_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 1.79 ± 11% perf-profile.cycles.kmem_cache_free.locks_free_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 0.46 ± 14% +257.0% 1.66 ± 11% perf-profile.cycles.kmem_cache_free.locks_free_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.40 ± 7% +158.6% 1.05 ± 17% perf-profile.cycles.kmem_cache_free.locks_free_lock.locks_dispose_list.__posix_lock_file.vfs_lock_file
> 0.00 ± -1% +Inf% 0.96 ± 10% perf-profile.cycles.lg_local_lock.locks_insert_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 0.00 ± -1% +Inf% 14.69 ± 10% perf-profile.cycles.locks_alloc_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 6.38 ± 3% -100.0% 0.00 ± -1% perf-profile.cycles.locks_alloc_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
> 3.28 ± 6% +127.1% 7.45 ± 12% perf-profile.cycles.locks_alloc_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 9.75 ± 13% perf-profile.cycles.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 3.61 ± 1% -100.0% 0.00 ± -1% perf-profile.cycles.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
> 0.00 ± -1% +Inf% 1.84 ± 11% perf-profile.cycles.locks_dispose_list.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 0.00 ± -1% +Inf% 2.42 ± 10% perf-profile.cycles.locks_free_lock.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 1.00 ± 3% -100.0% 0.00 ± -1% perf-profile.cycles.locks_free_lock.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
> 0.63 ± 11% +224.1% 2.05 ± 10% perf-profile.cycles.locks_free_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 1.22 ± 14% perf-profile.cycles.locks_free_lock.locks_dispose_list.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 0.00 ± -1% +Inf% 6.17 ± 15% perf-profile.cycles.locks_insert_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait.fcntl_setlk
> 2.31 ± 6% -100.0% 0.00 ± -1% perf-profile.cycles.locks_insert_lock_ctx.__posix_lock_file.vfs_lock_file.fcntl_setlk.sys_fcntl
> 0.00 ± -1% +Inf% 8.96 ± 13% perf-profile.cycles.locks_unlink_lock_ctx.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.do_lock_file_wait
> 3.27 ± 1% -100.0% 0.00 ± -1% perf-profile.cycles.locks_unlink_lock_ctx.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file.fcntl_setlk
> 53.88 ± 1% -79.7% 10.92 ± 46% perf-profile.cycles.native_queued_spin_lock_slowpath._raw_spin_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 2.75 ± 0% +183.3% 7.79 ± 13% perf-profile.cycles.put_pid.locks_unlink_lock_ctx.locks_delete_lock_ctx.__posix_lock_file.vfs_lock_file
> 1.11 ± 9% +137.2% 2.63 ± 14% perf-profile.cycles.security_file_fcntl.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.69 ± 4% +118.2% 3.69 ± 11% perf-profile.cycles.security_file_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.91 ± 9% +139.0% 2.17 ± 14% perf-profile.cycles.selinux_file_fcntl.security_file_fcntl.sys_fcntl.entry_SYSCALL_64_fastpath
> 1.39 ± 4% +114.6% 2.97 ± 10% perf-profile.cycles.selinux_file_lock.security_file_lock.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 0.00 ± -1% +Inf% 41.12 ± 12% perf-profile.cycles.vfs_lock_file.do_lock_file_wait.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 17.04 ± 3% -100.0% 0.00 ± -1% perf-profile.cycles.vfs_lock_file.fcntl_setlk.sys_fcntl.entry_SYSCALL_64_fastpath
> 34.75 ±148% +132.4% 80.75 ± 82% sched_debug.cfs_rq:/.load.8
> 15.00 ± 9% +198.3% 44.75 ± 72% sched_debug.cfs_rq:/.load_avg.21
> 25.00 ± 29% +574.0% 168.50 ± 78% sched_debug.cfs_rq:/.load_avg.9
> 38.47 ± 5% +29.1% 49.65 ± 26% sched_debug.cfs_rq:/.load_avg.avg
> 63.17 ± 10% +44.3% 91.16 ± 36% sched_debug.cfs_rq:/.load_avg.stddev
> 893865 ± 12% -12.5% 782455 ± 0% sched_debug.cfs_rq:/.min_vruntime.25
> 18.25 ± 26% +52.1% 27.75 ± 25% sched_debug.cfs_rq:/.runnable_load_avg.9
> -57635 ±-68% -196.4% 55548 ±130% sched_debug.cfs_rq:/.spread0.1
> -802264 ±-25% -29.5% -565458 ±-49% sched_debug.cfs_rq:/.spread0.8
> -804662 ±-25% -29.4% -567811 ±-48% sched_debug.cfs_rq:/.spread0.min
> 1233 ± 5% +30.9% 1614 ± 28% sched_debug.cfs_rq:/.tg_load_avg.0
> 1233 ± 5% +30.9% 1614 ± 28% sched_debug.cfs_rq:/.tg_load_avg.1
> 1228 ± 5% +30.3% 1601 ± 27% sched_debug.cfs_rq:/.tg_load_avg.10
> 1228 ± 5% +30.4% 1601 ± 27% sched_debug.cfs_rq:/.tg_load_avg.11
> 1228 ± 5% +30.3% 1601 ± 27% sched_debug.cfs_rq:/.tg_load_avg.12
> 1229 ± 5% +30.0% 1598 ± 27% sched_debug.cfs_rq:/.tg_load_avg.13
> 1228 ± 5% +30.1% 1598 ± 27% sched_debug.cfs_rq:/.tg_load_avg.14
> 1229 ± 5% +30.0% 1598 ± 27% sched_debug.cfs_rq:/.tg_load_avg.15
> 1226 ± 5% +30.3% 1598 ± 27% sched_debug.cfs_rq:/.tg_load_avg.16
> 1226 ± 5% +30.2% 1597 ± 27% sched_debug.cfs_rq:/.tg_load_avg.17
> 1227 ± 5% +30.1% 1595 ± 27% sched_debug.cfs_rq:/.tg_load_avg.18
> 1227 ± 5% +29.4% 1588 ± 26% sched_debug.cfs_rq:/.tg_load_avg.19
> 1233 ± 5% +30.4% 1609 ± 27% sched_debug.cfs_rq:/.tg_load_avg.2
> 1222 ± 5% +29.9% 1587 ± 26% sched_debug.cfs_rq:/.tg_load_avg.20
> 1223 ± 5% +24.2% 1519 ± 20% sched_debug.cfs_rq:/.tg_load_avg.21
> 1223 ± 5% +23.8% 1515 ± 20% sched_debug.cfs_rq:/.tg_load_avg.22
> 1223 ± 5% +23.9% 1515 ± 20% sched_debug.cfs_rq:/.tg_load_avg.23
> 1223 ± 5% +23.9% 1515 ± 20% sched_debug.cfs_rq:/.tg_load_avg.24
> 1223 ± 5% +23.5% 1511 ± 19% sched_debug.cfs_rq:/.tg_load_avg.25
> 1224 ± 5% +23.5% 1512 ± 19% sched_debug.cfs_rq:/.tg_load_avg.26
> 1223 ± 5% +23.1% 1506 ± 19% sched_debug.cfs_rq:/.tg_load_avg.27
> 1223 ± 5% +22.5% 1499 ± 19% sched_debug.cfs_rq:/.tg_load_avg.28
> 1224 ± 5% +22.5% 1499 ± 19% sched_debug.cfs_rq:/.tg_load_avg.29
> 1233 ± 5% +30.3% 1607 ± 27% sched_debug.cfs_rq:/.tg_load_avg.3
> 1223 ± 5% +22.2% 1495 ± 18% sched_debug.cfs_rq:/.tg_load_avg.30
> 1224 ± 5% +22.0% 1493 ± 19% sched_debug.cfs_rq:/.tg_load_avg.31
> 1234 ± 5% +30.0% 1604 ± 28% sched_debug.cfs_rq:/.tg_load_avg.4
> 1233 ± 5% +30.0% 1604 ± 28% sched_debug.cfs_rq:/.tg_load_avg.5
> 1231 ± 5% +30.3% 1604 ± 28% sched_debug.cfs_rq:/.tg_load_avg.6
> 1233 ± 5% +30.0% 1603 ± 27% sched_debug.cfs_rq:/.tg_load_avg.7
> 1231 ± 5% +30.1% 1601 ± 27% sched_debug.cfs_rq:/.tg_load_avg.8
> 1228 ± 5% +30.3% 1601 ± 27% sched_debug.cfs_rq:/.tg_load_avg.9
> 1228 ± 5% +27.8% 1569 ± 24% sched_debug.cfs_rq:/.tg_load_avg.avg
> 1246 ± 5% +30.7% 1628 ± 27% sched_debug.cfs_rq:/.tg_load_avg.max
> 1212 ± 5% +22.2% 1481 ± 19% sched_debug.cfs_rq:/.tg_load_avg.min
> 15.00 ± 9% +198.3% 44.75 ± 72% sched_debug.cfs_rq:/.tg_load_avg_contrib.21
> 25.00 ± 29% +574.0% 168.50 ± 78% sched_debug.cfs_rq:/.tg_load_avg_contrib.9
> 38.53 ± 5% +29.0% 49.71 ± 26% sched_debug.cfs_rq:/.tg_load_avg_contrib.avg
> 63.34 ± 10% +44.1% 91.30 ± 36% sched_debug.cfs_rq:/.tg_load_avg_contrib.stddev
> 532.25 ± 2% +8.5% 577.50 ± 6% sched_debug.cfs_rq:/.util_avg.15
> 210.75 ± 14% -14.4% 180.50 ± 4% sched_debug.cfs_rq:/.util_avg.29
> 450.00 ± 22% +50.7% 678.00 ± 18% sched_debug.cfs_rq:/.util_avg.9
> 955572 ± 4% -10.2% 857813 ± 5% sched_debug.cpu.avg_idle.6
> 23.99 ± 60% -76.2% 5.71 ± 24% sched_debug.cpu.clock.stddev
> 23.99 ± 60% -76.2% 5.71 ± 24% sched_debug.cpu.clock_task.stddev
> 2840 ± 37% -47.4% 1492 ± 65% sched_debug.cpu.curr->pid.25
> 34.75 ±148% +132.4% 80.75 ± 82% sched_debug.cpu.load.8
> 61776 ± 7% -7.1% 57380 ± 0% sched_debug.cpu.nr_load_updates.25
> 6543 ± 2% +20.4% 7879 ± 9% sched_debug.cpu.nr_switches.0
> 5256 ± 23% +177.1% 14566 ± 52% sched_debug.cpu.nr_switches.27
> 7915 ± 3% +8.7% 8605 ± 3% sched_debug.cpu.nr_switches.avg
> -0.25 ±-519% +1900.0% -5.00 ±-24% sched_debug.cpu.nr_uninterruptible.12
> 2.00 ± 93% -125.0% -0.50 ±-300% sched_debug.cpu.nr_uninterruptible.24
> 17468 ± 14% +194.3% 51413 ± 75% sched_debug.cpu.sched_count.15
> 2112 ± 2% +20.8% 2552 ± 11% sched_debug.cpu.sched_goidle.0
> 2103 ± 34% +219.0% 6709 ± 55% sched_debug.cpu.sched_goidle.27
> 3159 ± 3% +8.2% 3418 ± 4% sched_debug.cpu.sched_goidle.avg
> 1323 ± 64% -72.7% 361.50 ± 15% sched_debug.cpu.ttwu_count.23
> 3264 ± 12% +94.4% 6347 ± 41% sched_debug.cpu.ttwu_count.27
> 3860 ± 3% +9.0% 4208 ± 3% sched_debug.cpu.ttwu_count.avg
> 2358 ± 3% +28.7% 3035 ± 9% sched_debug.cpu.ttwu_local.0
> 1110 ± 22% +54.6% 1716 ± 28% sched_debug.cpu.ttwu_local.27
> 1814 ± 8% +16.1% 2106 ± 5% sched_debug.cpu.ttwu_local.stddev
>
>
> lkp-snb01: Sandy Bridge-EP
> Memory: 32G
>
> will-it-scale.per_thread_ops
>
> 1.2e+06 ++---------------------------------------------------------------+
> | O |
> 1.15e+06 O+O O O O O O O O |
> 1.1e+06 ++ |
> | O O O O O OO |
> 1.05e+06 ++ O O |
> 1e+06 ++ |
> | |
> 950000 ++ |
> 900000 ++ |
> | |
> 850000 ++ |
> 800000 *+*.*.*.*.*.*.*.*.*.*.*.*. .*.*. *.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
> | * * |
> 750000 ++---------------------------------------------------------------+
>
>
> will-it-scale.time.user_time
>
> 50 ++---------------------------------------------------------------------+
> | |
> 45 ++ O O O O O O |
> O O O O |
> | O O O O O |
> 40 ++ O O O O |
> | |
> 35 ++ |
> | |
> 30 ++ |
> | |
> | * |
> 25 ++ + + |
> *.*.*.*..*.* *.*.*..*.*.*.*.*.*.*..*.*.*.*.*.*.*..*.*.*.*.*.*..*.*.*.*
> 20 ++---------------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> To reproduce:
>
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Ying Huang
Thanks...
Huh...I'm stumped on this one. If anything I would have expected better
performance with this patch since we don't even take the file_lock or
do the fcheck in the F_UNLCK codepath now, or when there is an error.
I'll see if I can reproduce it on my own test rig, but I'd welcome
ideas of where and how this performance regression could have crept in.
--
Jeff Layton <jeff.layton@...marydata.com>
Powered by blists - more mailing lists