[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202506261012.11b518e7-lkp@intel.com>
Date: Thu, 26 Jun 2025 10:57:15 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Herbert Xu <herbert@...dor.apana.org.au>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-crypto@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <oliver.sang@...el.com>
Subject: [linux-next:master] [padata] 71203f68c7: unixbench.throughput 3.1%
regression
Hello,
normally we won't report performance results if we suspect it is caused by
alignment problems.
since this patch touches the code related with alignment:
- struct work_struct reorder_work;
- spinlock_t ____cacheline_aligned lock;
we still make out below report FYI what's the possible performance impact.
kernel test robot noticed a 3.1% regression of unixbench.throughput on:
commit: 71203f68c7749609d7fc8ae6ad054bdedeb24f91 ("padata: Fix pd UAF once and for all")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master 1b152eeca84a02bdb648f16b82ef3394007a9dcf]
testcase: unixbench
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
runtime: 300s
nr_task: 100%
test: fsbuffer-w
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+--------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops 1.1% improvement |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=thread |
| | nr_task=100% |
| | test=pwrite1 |
+------------------+--------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202506261012.11b518e7-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250626/202506261012.11b518e7-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp9/fsbuffer-w/unixbench
commit:
73c2437109 ("crypto: s390/sha3 - Use cpu byte-order when exporting")
71203f68c7 ("padata: Fix pd UAF once and for all")
73c2437109c3eab2 71203f68c7749609d7fc8ae6ad0
---------------- ---------------------------
%stddev %change %stddev
\ | \
111306 +2.0% 113530 proc-vmstat.pgreuse
0.01 ± 4% +14.9% 0.01 perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
715.14 ±167% -99.4% 3.99 ± 80% perf-sched.total_sch_delay.max.ms
33278354 -3.1% 32249523 unixbench.throughput
1854 -4.8% 1765 unixbench.time.user_time
1.233e+10 -3.1% 1.195e+10 unixbench.workload
4.717e+10 -3.1% 4.573e+10 perf-stat.i.branch-instructions
0.42 -0.0 0.41 perf-stat.i.branch-miss-rate%
28489209 ± 2% -10.9% 25397034 perf-stat.i.branch-misses
0.97 +2.8% 1.00 perf-stat.i.cpi
1.946e+11 -3.1% 1.886e+11 perf-stat.i.instructions
1.05 -2.8% 1.02 perf-stat.i.ipc
0.06 ± 2% -0.0 0.06 perf-stat.overall.branch-miss-rate%
0.94 +3.2% 0.97 perf-stat.overall.cpi
1.06 -3.1% 1.03 perf-stat.overall.ipc
4.706e+10 -3.1% 4.562e+10 perf-stat.ps.branch-instructions
28421825 ± 2% -10.9% 25336865 perf-stat.ps.branch-misses
1.942e+11 -3.1% 1.882e+11 perf-stat.ps.instructions
7.212e+13 -3.1% 6.991e+13 perf-stat.total.instructions
***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/pwrite1/will-it-scale
commit:
73c2437109 ("crypto: s390/sha3 - Use cpu byte-order when exporting")
71203f68c7 ("padata: Fix pd UAF once and for all")
73c2437109c3eab2 71203f68c7749609d7fc8ae6ad0
---------------- ---------------------------
%stddev %change %stddev
\ | \
997120 +1.5% 1011812 proc-vmstat.pgfree
55606929 +1.1% 56223715 will-it-scale.104.threads
534681 +1.1% 540612 will-it-scale.per_thread_ops
55606929 +1.1% 56223715 will-it-scale.workload
0.01 ± 34% +63.9% 0.02 ± 31% perf-sched.sch_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
233.78 ±143% +242.3% 800.28 perf-sched.wait_and_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
74.68 ± 6% +18.5% 88.49 ± 7% perf-sched.wait_and_delay.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
4.69 ± 44% -84.7% 0.72 ± 30% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
576.00 ± 9% -17.7% 473.83 ± 7% perf-sched.wait_and_delay.count.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
999.68 -98.9% 11.47 ± 85% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
234.63 ±142% +240.8% 799.73 perf-sched.wait_time.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
74.67 ± 6% +18.5% 88.46 ± 7% perf-sched.wait_time.avg.ms.anon_pipe_read.fifo_pipe_read.vfs_read.ksys_read
4.31 ± 48% -91.6% 0.36 ± 29% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
999.37 -99.3% 6.65 ± 67% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
1.686e+10 +1.1% 1.704e+10 perf-stat.i.branch-instructions
1.62 +0.0 1.67 perf-stat.i.branch-miss-rate%
2.726e+08 +4.1% 2.836e+08 perf-stat.i.branch-misses
3.36 -1.0% 3.32 perf-stat.i.cpi
8.562e+10 +1.1% 8.656e+10 perf-stat.i.instructions
1.62 +0.0 1.66 perf-stat.overall.branch-miss-rate%
3.36 -1.0% 3.33 perf-stat.overall.cpi
1.68e+10 +1.1% 1.698e+10 perf-stat.ps.branch-instructions
2.717e+08 +4.1% 2.827e+08 perf-stat.ps.branch-misses
8.533e+10 +1.1% 8.627e+10 perf-stat.ps.instructions
2.578e+13 +1.1% 2.607e+13 perf-stat.total.instructions
4.44 ± 3% -0.9 3.51 ± 5% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_pwrite
11.68 -0.3 11.41 perf-profile.calltrace.cycles-pp.copy_folio_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
0.85 ± 8% -0.2 0.70 ± 3% perf-profile.calltrace.cycles-pp.file_remove_privs_flags.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
0.81 ± 3% -0.1 0.67 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
2.26 -0.1 2.13 perf-profile.calltrace.cycles-pp.fdget.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
0.94 -0.0 0.90 ± 2% perf-profile.calltrace.cycles-pp.noop_dirty_folio.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
1.14 +0.1 1.19 ± 2% perf-profile.calltrace.cycles-pp.folio_mark_dirty.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
2.18 ± 2% +0.4 2.58 ± 10% perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
2.58 ± 3% -0.5 2.07 ± 4% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
7.87 -0.5 7.38 perf-profile.children.cycles-pp.entry_SYSCALL_64
11.90 -0.3 11.62 perf-profile.children.cycles-pp.copy_folio_from_iter_atomic
9.13 -0.3 8.87 perf-profile.children.cycles-pp.rep_movs_alternative
0.87 ± 8% -0.1 0.72 ± 3% perf-profile.children.cycles-pp.file_remove_privs_flags
0.84 ± 3% -0.1 0.71 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
2.26 -0.1 2.13 perf-profile.children.cycles-pp.fdget
1.01 -0.0 0.96 perf-profile.children.cycles-pp.noop_dirty_folio
0.43 ± 2% -0.0 0.39 ± 3% perf-profile.children.cycles-pp.rcu_all_qs
0.29 ± 3% -0.0 0.26 perf-profile.children.cycles-pp.inode_to_bdi
0.30 -0.0 0.27 perf-profile.children.cycles-pp.x64_sys_call
0.35 ± 2% -0.0 0.32 perf-profile.children.cycles-pp.rw_verify_area
0.39 ± 3% +0.2 0.58 ± 24% perf-profile.children.cycles-pp.xas_load
2.20 ± 2% +0.4 2.61 ± 10% perf-profile.children.cycles-pp.filemap_get_entry
6.97 -0.5 6.48 perf-profile.self.cycles-pp.entry_SYSCALL_64
2.14 ± 3% -0.4 1.71 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp
2.60 -0.3 2.28 ± 4% perf-profile.self.cycles-pp.do_syscall_64
8.90 -0.2 8.65 perf-profile.self.cycles-pp.rep_movs_alternative
2.28 -0.2 2.10 perf-profile.self.cycles-pp.shmem_write_end
0.86 ± 8% -0.1 0.71 ± 3% perf-profile.self.cycles-pp.file_remove_privs_flags
2.24 -0.1 2.11 perf-profile.self.cycles-pp.fdget
0.58 ± 4% -0.1 0.48 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
0.70 ± 3% -0.1 0.62 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.61 ± 2% -0.1 0.55 perf-profile.self.cycles-pp.generic_write_checks
0.34 ± 2% -0.0 0.31 ± 3% perf-profile.self.cycles-pp.rcu_all_qs
0.28 -0.0 0.25 perf-profile.self.cycles-pp.x64_sys_call
0.25 ± 4% -0.0 0.22 perf-profile.self.cycles-pp.inode_to_bdi
0.24 ± 3% -0.0 0.21 perf-profile.self.cycles-pp.rw_verify_area
0.77 ± 2% +0.0 0.82 ± 2% perf-profile.self.cycles-pp.folio_mark_dirty
0.72 +0.1 0.78 ± 2% perf-profile.self.cycles-pp.current_time
0.20 ± 3% +0.2 0.39 ± 31% perf-profile.self.cycles-pp.xas_load
1.80 ± 2% +0.2 2.02 ± 6% perf-profile.self.cycles-pp.filemap_get_entry
9.38 +0.2 9.61 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
2.18 ± 2% +0.3 2.50 ± 5% perf-profile.self.cycles-pp.shmem_write_begin
2.54 +0.8 3.37 ± 4% perf-profile.self.cycles-pp.__libc_pwrite
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists