[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9FF32F53-5EF8-40D4-B696-A30FDF7201E1@zytor.com>
Date: Tue, 16 Aug 2016 09:59:00 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: kernel test robot <xiaolong.ye@...el.com>,
Ville Syrjälä <ville.syrjala@...ux.intel.com>
CC: Linus Torvalds <torvalds@...ux-foundation.org>,
Borislav Petkov <bp@...e.de>,
Andy Lutomirski <luto@...capital.net>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: Re: [lkp] [x86/hweight] 65ea11ec6a: will-it-scale.per_process_ops 9.3% improvement
On August 16, 2016 7:26:43 AM PDT, kernel test robot <xiaolong.ye@...el.com> wrote:
>
>FYI, we noticed a 9.3% improvement of will-it-scale.per_process_ops due
>to commit:
>
>commit 65ea11ec6a82b1d44aba62b59e9eb20247e57c6e ("x86/hweight: Don't
>clobber %rdi")
>https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>master
>
>in testcase: will-it-scale
>on test machine: 32 threads Sandy Bridge-EP with 64G memory
>with following parameters:
>
> test: unix1
> cpufreq_governor: performance
>
>
>Disclaimer:
>Results have been estimated based on internal Intel analysis and are
>provided
>for informational purposes only. Any difference in system hardware or
>software
>design or configuration may affect actual performance.
>
>Details are as below:
>-------------------------------------------------------------------------------------------------->
>
>
>To reproduce:
>
>git clone
>git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
>=========================================================================================
>compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
>gcc-6/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-sb03/unix1/will-it-scale
>
>commit:
> v4.8-rc1
> 65ea11ec6a ("x86/hweight: Don't clobber %rdi")
>
> v4.8-rc1 65ea11ec6a82b1d44aba62b59e
>---------------- --------------------------
> fail:runs %reproduction fail:runs
> | | |
> 1:8 -12% :4 last_state.is_incomplete_run
>4:8 -50% :4
>kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
>7:8 -88% :4
>kmsg.drm:drm_edid_block_valid[drm]]*ERROR*EDID_checksum_is_invalid,remainder_is
>7:8 -88% :4
>kmsg.i8042:Can't_read_CTR_while_initializing_i8042
> %stddev %change %stddev
> \ | \
>1063041 ± 0% +9.3% 1161810 ± 0%
>will-it-scale.per_process_ops
> 976004 ± 0% +9.0% 1063615 ± 0% will-it-scale.per_thread_ops
> 0.57 ± 0% -6.7% 0.53 ± 1% will-it-scale.scalability
> 175.96 ± 0% +8.0% 190.10 ± 0% will-it-scale.time.user_time
>0.00 ± 20% -31.5% 0.00 ± 26%
>sched_debug.cpu.next_balance.stddev
>101.14 ± 11% +9639.4% 9850 ±121%
>latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_getattr.[nfsv4].nfs4_proc_getattr.[nfsv4].__nfs_revalidate_inode.nfs_do_access.nfs_permission.__inode_permission.inode_permission
>148.57 ± 15% +57704.4% 85880 ±125%
>latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_getattr.[nfsv4].nfs4_proc_getattr.[nfsv4].__nfs_revalidate_inode.nfs_do_access.nfs_permission.__inode_permission.inode_permission
>886.00 ± 14% +9757.0% 87333 ±123%
>latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_getattr.[nfsv4].nfs4_proc_getattr.[nfsv4].__nfs_revalidate_inode.nfs_do_access.nfs_permission.__inode_permission.inode_permission
>3.041e+12 ± 1% +7.4% 3.267e+12 ± 1%
>perf-stat.branch-instructions
> 0.31 ± 0% -86.6% 0.04 ± 4% perf-stat.branch-miss-rate
> 9.456e+09 ± 1% -85.6% 1.364e+09 ± 3% perf-stat.branch-misses
> 5.147e+12 ± 1% +5.4% 5.427e+12 ± 1% perf-stat.dTLB-loads
> 3.869e+12 ± 0% +6.7% 4.128e+12 ± 1% perf-stat.dTLB-stores
> 29.02 ± 13% +223.2% 93.80 ± 0% perf-stat.iTLB-load-miss-rate
>2.353e+08 ± 21% +733.0% 1.96e+09 ± 0% perf-stat.iTLB-load-misses
> 5.7e+08 ± 9% -77.2% 1.297e+08 ± 10% perf-stat.iTLB-loads
> 1.696e+13 ± 0% +6.9% 1.814e+13 ± 0% perf-stat.instructions
>75030 ± 18% -87.7% 9251 ± 1%
>perf-stat.instructions-per-iTLB-miss
> 1.04 ± 0% +7.6% 1.12 ± 1% perf-stat.ipc
> 24064971 ± 3% -6.6% 22469931 ± 3% perf-stat.node-load-misses
> 53705459 ± 1% -3.1% 52034054 ± 2% perf-stat.node-loads
>7.32 ± 5% +23.3% 9.03 ± 4%
>perf-profile.cycles.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg
>1.29 ± 4% +11.7% 1.44 ± 5%
>perf-profile.cycles.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath
>1.15 ± 4% +12.1% 1.29 ± 4%
>perf-profile.cycles.__fget.__fget_light.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath
>1.22 ± 5% +11.7% 1.36 ± 5%
>perf-profile.cycles.__fget_light.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath
>1.86 ± 4% -58.4% 0.77 ± 7%
>perf-profile.cycles.__inode_security_revalidate.selinux_file_permission.security_file_permission.rw_verify_area.vfs_write
>0.00 ± -1% +Inf% 2.65 ± 5%
>perf-profile.cycles.__kmalloc_node_track_caller.__kmalloc_reserve.isra.33.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>1.89 ± 8% -100.0% 0.00 ± -1%
>perf-profile.cycles.__kmalloc_node_track_caller.__kmalloc_reserve.isra.35.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>0.00 ± -1% +Inf% 3.55 ± 5%
>perf-profile.cycles.__kmalloc_reserve.isra.33.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
>2.52 ± 8% -100.0% 0.00 ± -1%
>perf-profile.cycles.__kmalloc_reserve.isra.35.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
>1.43 ± 4% -91.1% 0.13 ±173%
>perf-profile.cycles.__might_sleep.__inode_security_revalidate.selinux_file_permission.security_file_permission.rw_verify_area
>1.15 ± 5% -65.7% 0.40 ± 57%
>perf-profile.cycles.__might_sleep.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>1.33 ± 7% +14.0% 1.52 ± 2%
>perf-profile.cycles._raw_spin_lock_irqsave.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>1.37 ± 6% +20.4% 1.65 ± 3%
>perf-profile.cycles._raw_spin_lock_irqsave.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>1.09 ± 9% +15.6% 1.26 ± 5%
>perf-profile.cycles._raw_spin_unlock_irqrestore.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>1.01 ± 6% +15.4% 1.17 ± 7%
>perf-profile.cycles._raw_spin_unlock_irqrestore.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>8.01 ± 6% +22.5% 9.82 ± 4%
>perf-profile.cycles.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>7.33 ± 6% +14.8% 8.42 ± 4%
>perf-profile.cycles.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
>0.98 ± 8% +15.0% 1.12 ± 4%
>perf-profile.cycles.consume_skb.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read
>1.60 ± 5% +18.7% 1.91 ± 3%
>perf-profile.cycles.copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>2.30 ± 4% +11.5% 2.56 ± 6%
>perf-profile.cycles.entry_SYSCALL_64
>2.10 ± 3% +18.1% 2.48 ± 5%
>perf-profile.cycles.entry_SYSCALL_64_after_swapgs
>2.82 ± 7% -34.6% 1.85 ± 6%
>perf-profile.cycles.file_has_perm.selinux_file_permission.security_file_permission.rw_verify_area.vfs_read
>1.55 ± 6% +21.3% 1.89 ± 5%
>perf-profile.cycles.fput.entry_SYSCALL_64_fastpath
>1.13 ± 9% +17.0% 1.32 ± 3%
>perf-profile.cycles.kfree.skb_free_head.skb_release_data.skb_release_all.consume_skb
>0.76 ± 8% +21.9% 0.93 ± 5%
>perf-profile.cycles.kfree_skbmem.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>0.77 ± 10% +27.0% 0.98 ± 5%
>perf-profile.cycles.ksize.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
>2.08 ± 6% -31.5% 1.42 ± 6%
>perf-profile.cycles.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
>0.89 ± 9% +18.8% 1.06 ± 6%
>perf-profile.cycles.mutex_unlock.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read
>6.80 ± 3% -19.3% 5.49 ± 3%
>perf-profile.cycles.rw_verify_area.vfs_read.sys_read.entry_SYSCALL_64_fastpath
>5.54 ± 4% -23.5% 4.24 ± 5%
>perf-profile.cycles.rw_verify_area.vfs_write.sys_write.entry_SYSCALL_64_fastpath
>6.21 ± 4% -19.5% 5.00 ± 3%
>perf-profile.cycles.security_file_permission.rw_verify_area.vfs_read.sys_read.entry_SYSCALL_64_fastpath
>5.23 ± 4% -25.6% 3.89 ± 5%
>perf-profile.cycles.security_file_permission.rw_verify_area.vfs_write.sys_write.entry_SYSCALL_64_fastpath
>4.67 ± 4% -24.1% 3.55 ± 4%
>perf-profile.cycles.selinux_file_permission.security_file_permission.rw_verify_area.vfs_read.sys_read
>4.87 ± 5% -28.0% 3.51 ± 5%
>perf-profile.cycles.selinux_file_permission.security_file_permission.rw_verify_area.vfs_write.sys_write
>2.43 ± 5% +29.8% 3.15 ± 3%
>perf-profile.cycles.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write
>1.18 ± 8% +16.1% 1.36 ± 2%
>perf-profile.cycles.skb_free_head.skb_release_data.skb_release_all.consume_skb.unix_stream_read_generic
>2.60 ± 7% +15.4% 3.00 ± 3%
>perf-profile.cycles.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write
>6.30 ± 6% +15.2% 7.26 ± 4%
>perf-profile.cycles.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>1.45 ± 7% +19.4% 1.73 ± 2%
>perf-profile.cycles.skb_release_data.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
>4.63 ± 6% +14.4% 5.30 ± 5%
>perf-profile.cycles.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
>1.01 ± 4% +16.7% 1.18 ± 5%
>perf-profile.cycles.skb_set_owner_w.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>2.59 ± 6% +18.2% 3.07 ± 4%
>perf-profile.cycles.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
>9.66 ± 5% +21.1% 11.70 ± 3%
>perf-profile.cycles.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write
>25.86 ± 5% +14.8% 29.68 ± 4%
>perf-profile.cycles.sock_sendmsg.sock_write_iter.__vfs_write.vfs_write.sys_write
>3.88 ± 7% +13.1% 4.38 ± 5%
>perf-profile.cycles.sock_wfree.unix_destruct_scm.skb_release_head_state.skb_release_all.consume_skb
>4.24 ± 7% +13.3% 4.80 ± 5%
>perf-profile.cycles.unix_destruct_scm.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic
>21.96 ± 5% +17.1% 25.71 ± 3%
>perf-profile.cycles.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write.vfs_write
>1.20 ± 6% -100.0% 0.00 ± -1%
>perf-profile.cycles.unix_stream_sendmsg.sock_write_iter.__vfs_write.vfs_write.sys_write
>2.28 ± 6% +13.7% 2.60 ± 3%
>perf-profile.cycles.unix_write_space.sock_wfree.unix_destruct_scm.skb_release_head_state.skb_release_all
>3.84 ± 5% -16.8% 3.20 ± 2%
>perf-profile.func.cycles.___might_sleep
>1.96 ± 7% +20.8% 2.36 ± 4%
>perf-profile.func.cycles.__alloc_skb
>2.40 ± 4% +11.3% 2.67 ± 4% perf-profile.func.cycles.__fget
>1.30 ± 9% +48.7% 1.94 ± 4%
>perf-profile.func.cycles.__kmalloc_node_track_caller
>1.05 ± 5% +12.6% 1.19 ± 7%
>perf-profile.func.cycles.__vfs_read
>0.99 ± 7% +27.1% 1.26 ± 4%
>perf-profile.func.cycles.__vfs_write
>1.01 ± 5% -51.9% 0.48 ± 3%
>perf-profile.func.cycles._cond_resched
>2.78 ± 6% +17.0% 3.25 ± 2%
>perf-profile.func.cycles._raw_spin_lock_irqsave
>2.19 ± 8% +15.5% 2.53 ± 6%
>perf-profile.func.cycles._raw_spin_unlock_irqrestore
>1.10 ± 8% +11.2% 1.23 ± 4%
>perf-profile.func.cycles.consume_skb
>0.97 ± 5% +25.6% 1.22 ± 3%
>perf-profile.func.cycles.copy_from_iter
>2.30 ± 4% +11.5% 2.56 ± 6%
>perf-profile.func.cycles.entry_SYSCALL_64
>2.10 ± 3% +18.1% 2.48 ± 5%
>perf-profile.func.cycles.entry_SYSCALL_64_after_swapgs
>2.26 ± 4% -38.4% 1.39 ± 5%
>perf-profile.func.cycles.file_has_perm
> 1.55 ± 6% +21.3% 1.89 ± 5% perf-profile.func.cycles.fput
> 1.18 ± 8% +17.2% 1.38 ± 3% perf-profile.func.cycles.kfree
> 0.86 ± 10% +22.0% 1.05 ± 4% perf-profile.func.cycles.ksize
>0.90 ± 8% +18.7% 1.06 ± 5%
>perf-profile.func.cycles.mutex_unlock
>1.91 ± 6% -13.1% 1.66 ± 3%
>perf-profile.func.cycles.selinux_file_permission
>1.05 ± 5% +16.7% 1.23 ± 5%
>perf-profile.func.cycles.skb_set_owner_w
>1.66 ± 8% +16.3% 1.93 ± 7%
>perf-profile.func.cycles.sock_wfree
>2.44 ± 4% -39.7% 1.47 ± 2%
>perf-profile.func.cycles.sock_write_iter
>4.20 ± 6% -21.1% 3.32 ± 3%
>perf-profile.func.cycles.unix_stream_sendmsg
>2.35 ± 6% +14.3% 2.69 ± 3%
>perf-profile.func.cycles.unix_write_space
>
>
>
>Thanks,
>Xiaolong
Dang...
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
Powered by blists - more mailing lists