[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202503311302.a2bb29e1-lkp@intel.com>
Date: Mon, 31 Mar 2025 14:00:31 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Dave Hansen <dave.hansen@...ux.intel.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>, Ted Ts'o <tytso@....edu>, "Matthew
Wilcox" <willy@...radead.org>, Mateusz Guzik <mjguzik@...il.com>, Dave
Chinner <david@...morbit.com>, <linux-fsdevel@...r.kernel.org>,
<oliver.sang@...el.com>
Subject: [linus:master] [filemap] 665575cff0: will-it-scale.per_thread_ops
3.6% improvement
Hello,
kernel test robot noticed a 3.6% improvement of will-it-scale.per_thread_ops on:
commit: 665575cff098b696995ddaddf4646a4099941f5e ("filemap: move prefaulting out of hot write path")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_task: 100%
mode: thread
test: writeseek1
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | unixbench: unixbench.throughput 4.6% improvement |
| test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | nr_task=100% |
| | runtime=300s |
| | test=fsbuffer-w |
+------------------+-------------------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250331/202503311302.a2bb29e1-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/writeseek1/will-it-scale
commit:
654b33ada4 ("proc: fix UAF in proc_get_inode()")
665575cff0 ("filemap: move prefaulting out of hot write path")
654b33ada4ab5e92 665575cff098b696995ddaddf46
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.171e+08 ± 11% +30.4% 1.526e+08 ± 18% cpuidle..time
96.67 ± 15% -38.6% 59.33 ± 15% perf-c2c.HITM.local
91.33 ± 22% -37.8% 56.83 ± 18% perf-c2c.HITM.remote
77338762 +3.6% 80146917 will-it-scale.64.threads
1208417 +3.6% 1252295 will-it-scale.per_thread_ops
77338762 +3.6% 80146917 will-it-scale.workload
0.02 ± 3% +0.0 0.03 ± 19% perf-stat.i.branch-miss-rate%
9721738 ± 4% +8.9% 10586240 ± 5% perf-stat.i.branch-misses
0.02 ± 3% +0.0 0.02 ± 5% perf-stat.overall.branch-miss-rate%
683007 -3.3% 660149 perf-stat.overall.path-length
9685250 ± 4% +8.9% 10545947 ± 5% perf-stat.ps.branch-misses
31.54 -2.4 29.18 perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
40.31 -1.9 38.39 perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
46.03 -1.9 44.17 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
53.97 -1.7 52.30 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
58.43 -1.4 56.98 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
59.33 -1.4 57.92 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
74.17 -0.8 73.38 perf-profile.calltrace.cycles-pp.write
0.55 +0.0 0.57 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.97 +0.0 1.01 perf-profile.calltrace.cycles-pp.folio_unlock.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
0.57 ± 3% +0.0 0.60 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
1.01 +0.0 1.04 perf-profile.calltrace.cycles-pp.fput.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.08 +0.0 1.12 perf-profile.calltrace.cycles-pp.up_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
0.54 +0.0 0.59 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.llseek
0.99 +0.0 1.04 perf-profile.calltrace.cycles-pp.mutex_unlock.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.60 +0.1 1.67 perf-profile.calltrace.cycles-pp.down_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
0.68 ± 2% +0.1 0.75 ± 3% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64_mg.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter
1.61 +0.1 1.69 perf-profile.calltrace.cycles-pp.mutex_lock.fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.94 +0.1 1.02 perf-profile.calltrace.cycles-pp.folio_mark_accessed.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
3.19 +0.1 3.27 perf-profile.calltrace.cycles-pp.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
0.87 +0.1 0.96 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.23 +0.1 2.32 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
2.21 +0.1 2.31 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.57 +0.2 1.72 perf-profile.calltrace.cycles-pp.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write
4.90 +0.2 5.08 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
0.60 ± 3% +0.2 0.80 ± 5% perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
6.44 +0.2 6.66 perf-profile.calltrace.cycles-pp.clear_bhb_loop.llseek
2.30 +0.2 2.52 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write.ksys_write
2.24 +0.2 2.48 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
6.06 +0.2 6.30 perf-profile.calltrace.cycles-pp.clear_bhb_loop.write
2.82 +0.3 3.09 perf-profile.calltrace.cycles-pp.file_update_time.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
4.78 +0.3 5.06 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.llseek
5.77 +0.5 6.26 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write
0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.shmem_file_llseek.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek
0.00 +0.5 0.54 ± 2% perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write
6.65 +0.5 7.20 perf-profile.calltrace.cycles-pp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
14.05 +0.8 14.81 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
28.53 +0.9 29.47 perf-profile.calltrace.cycles-pp.llseek
32.10 -2.4 29.69 perf-profile.children.cycles-pp.generic_perform_write
40.86 -1.9 38.96 perf-profile.children.cycles-pp.shmem_file_write_iter
46.47 -1.8 44.64 perf-profile.children.cycles-pp.vfs_write
54.38 -1.6 52.74 perf-profile.children.cycles-pp.ksys_write
72.02 -1.1 70.92 perf-profile.children.cycles-pp.do_syscall_64
73.76 -1.1 72.66 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
74.63 -0.7 73.89 perf-profile.children.cycles-pp.write
0.59 ± 2% -0.0 0.54 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.20 +0.0 0.21 perf-profile.children.cycles-pp.file_remove_privs
0.33 +0.0 0.35 perf-profile.children.cycles-pp.__f_unlock_pos
0.53 +0.0 0.54 perf-profile.children.cycles-pp.generic_file_llseek_size
0.89 +0.0 0.92 perf-profile.children.cycles-pp.testcase
2.29 +0.0 2.32 perf-profile.children.cycles-pp.fput
0.68 ± 2% +0.0 0.71 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.38 +0.0 0.42 ± 2% perf-profile.children.cycles-pp.security_file_permission
0.42 +0.0 0.46 ± 2% perf-profile.children.cycles-pp.write@plt
1.17 +0.0 1.21 perf-profile.children.cycles-pp.x64_sys_call
1.04 +0.0 1.08 perf-profile.children.cycles-pp.folio_unlock
1.14 +0.0 1.19 perf-profile.children.cycles-pp.up_write
0.63 ± 2% +0.0 0.67 perf-profile.children.cycles-pp.file_remove_privs_flags
0.53 ± 2% +0.0 0.58 perf-profile.children.cycles-pp.shmem_file_llseek
1.59 +0.1 1.65 perf-profile.children.cycles-pp.rcu_all_qs
2.19 +0.1 2.26 perf-profile.children.cycles-pp.mutex_unlock
0.75 +0.1 0.82 ± 3% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64_mg
0.22 ± 3% +0.1 0.29 ± 2% perf-profile.children.cycles-pp.inode_to_bdi
0.44 ± 2% +0.1 0.52 ± 2% perf-profile.children.cycles-pp.xas_start
1.73 +0.1 1.80 perf-profile.children.cycles-pp.down_write
3.40 +0.1 3.48 perf-profile.children.cycles-pp.shmem_write_end
1.00 +0.1 1.08 perf-profile.children.cycles-pp.folio_mark_accessed
3.53 +0.1 3.62 perf-profile.children.cycles-pp.mutex_lock
0.65 +0.1 0.75 perf-profile.children.cycles-pp.xas_load
1.00 +0.1 1.10 perf-profile.children.cycles-pp.rw_verify_area
1.48 ± 3% +0.1 1.59 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
3.58 +0.2 3.74 perf-profile.children.cycles-pp.__cond_resched
1.71 +0.2 1.86 perf-profile.children.cycles-pp.current_time
4.70 +0.2 4.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
4.36 +0.2 4.57 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.74 ± 2% +0.2 0.95 ± 4% perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
2.43 +0.2 2.66 perf-profile.children.cycles-pp.inode_needs_update_time
2.38 +0.2 2.62 perf-profile.children.cycles-pp.filemap_get_entry
5.53 +0.3 5.78 perf-profile.children.cycles-pp.entry_SYSCALL_64
2.96 +0.3 3.23 perf-profile.children.cycles-pp.file_update_time
12.62 +0.5 13.10 perf-profile.children.cycles-pp.clear_bhb_loop
6.04 +0.5 6.54 perf-profile.children.cycles-pp.shmem_get_folio_gfp
6.79 +0.6 7.35 perf-profile.children.cycles-pp.shmem_write_begin
14.14 +0.8 14.90 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
28.74 +0.9 29.66 perf-profile.children.cycles-pp.llseek
0.58 ± 3% -0.0 0.54 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.13 +0.0 0.14 perf-profile.self.cycles-pp.__f_unlock_pos
0.47 +0.0 0.48 perf-profile.self.cycles-pp.generic_file_llseek_size
0.36 +0.0 0.38 perf-profile.self.cycles-pp.folio_mark_dirty
0.27 ± 2% +0.0 0.29 perf-profile.self.cycles-pp.xas_load
1.56 +0.0 1.58 perf-profile.self.cycles-pp.shmem_write_end
1.14 +0.0 1.17 perf-profile.self.cycles-pp.down_write
0.31 +0.0 0.34 ± 2% perf-profile.self.cycles-pp.security_file_permission
0.54 ± 3% +0.0 0.58 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
1.02 +0.0 1.06 perf-profile.self.cycles-pp.x64_sys_call
0.56 ± 2% +0.0 0.60 perf-profile.self.cycles-pp.file_remove_privs_flags
0.52 +0.0 0.56 ± 2% perf-profile.self.cycles-pp.file_update_time
0.41 ± 2% +0.0 0.45 perf-profile.self.cycles-pp.shmem_file_llseek
0.96 +0.0 1.00 perf-profile.self.cycles-pp.folio_unlock
1.07 +0.0 1.12 perf-profile.self.cycles-pp.up_write
1.36 +0.0 1.41 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.80 +0.0 0.85 perf-profile.self.cycles-pp.ksys_lseek
0.86 ± 2% +0.0 0.91 perf-profile.self.cycles-pp.generic_write_checks
1.20 +0.0 1.24 perf-profile.self.cycles-pp.rcu_all_qs
0.75 +0.1 0.80 ± 2% perf-profile.self.cycles-pp.shmem_write_begin
0.16 ± 4% +0.1 0.21 ± 4% perf-profile.self.cycles-pp.inode_to_bdi
2.05 +0.1 2.11 perf-profile.self.cycles-pp.mutex_unlock
0.62 ± 2% +0.1 0.69 perf-profile.self.cycles-pp.rw_verify_area
0.69 ± 2% +0.1 0.75 ± 3% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64_mg
0.72 +0.1 0.79 perf-profile.self.cycles-pp.inode_needs_update_time
0.31 ± 2% +0.1 0.38 ± 3% perf-profile.self.cycles-pp.xas_start
0.93 +0.1 1.01 perf-profile.self.cycles-pp.folio_mark_accessed
1.98 +0.1 2.07 perf-profile.self.cycles-pp.__cond_resched
2.34 +0.1 2.43 perf-profile.self.cycles-pp.llseek
0.94 +0.1 1.04 perf-profile.self.cycles-pp.current_time
1.48 ± 3% +0.1 1.59 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
2.50 +0.1 2.62 perf-profile.self.cycles-pp.do_syscall_64
0.52 ± 4% +0.1 0.66 ± 7% perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
1.72 +0.1 1.87 perf-profile.self.cycles-pp.filemap_get_entry
4.03 +0.2 4.19 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
2.97 +0.2 3.15 perf-profile.self.cycles-pp.write
2.14 +0.2 2.33 perf-profile.self.cycles-pp.shmem_get_folio_gfp
4.22 +0.2 4.42 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
12.49 +0.5 12.96 perf-profile.self.cycles-pp.clear_bhb_loop
13.95 +0.7 14.70 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
***************************************************************************************************
lkp-icl-2sp9: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp9/fsbuffer-w/unixbench
commit:
654b33ada4 ("proc: fix UAF in proc_get_inode()")
665575cff0 ("filemap: move prefaulting out of hot write path")
654b33ada4ab5e92 665575cff098b696995ddaddf46
---------------- ---------------------------
%stddev %change %stddev
\ | \
32471117 +4.6% 33974569 unixbench.throughput
1819 +4.0% 1892 unixbench.time.user_time
1.201e+10 +4.8% 1.259e+10 unixbench.workload
0.33 ± 2% +3.1% 0.34 perf-stat.i.MPKI
4.577e+10 +1.4% 4.64e+10 perf-stat.i.branch-instructions
0.02 -0.0 0.02 perf-stat.overall.branch-miss-rate%
6053 -4.0% 5808 perf-stat.overall.path-length
4.566e+10 +1.4% 4.629e+10 perf-stat.ps.branch-instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists