[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202503181447.69ed9a01-lkp@intel.com>
Date: Tue, 18 Mar 2025 14:39:56 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, Jakub Kicinski
<kuba@...nel.org>, Kuniyuki Iwashima <kuniyu@...zon.com>, Jason Xing
<kerneljasonxing@...il.com>, <netdev@...r.kernel.org>,
<oliver.sang@...el.com>
Subject: [linux-next:master] [inet] 9544d60a26:
stress-ng.sockmany.ops_per_sec 4.5% improvement
Hello,
kernel test robot noticed a 4.5% improvement of stress-ng.sockmany.ops_per_sec on:
commit: 9544d60a2605d1500cf5c3e331a76b9eaf4538c9 ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: sockmany
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250318/202503181447.69ed9a01-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s
commit:
f8ece40786 ("tcp: bring back NUMA dispersion in inet_ehash_locks_alloc()")
9544d60a26 ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
f8ece40786c93422 9544d60a2605d1500cf5c3e331a
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.03 ± 61% +75.0% 0.06 ± 13% vmstat.procs.b
197669 ± 9% +7.1% 211706 vmstat.system.cs
3052932 ± 2% +4.3% 3183417 proc-vmstat.nr_slab_unreclaimable
2120009 +2.2% 2166756 proc-vmstat.numa_hit
1888278 +2.1% 1927323 proc-vmstat.numa_local
303242 ± 3% +58.5% 480662 ± 2% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.17 ± 7% -16.7% 0.14 ± 11% sched_debug.cfs_rq:/.h_nr_runnable.stddev
303242 ± 3% +58.5% 480662 ± 2% sched_debug.cfs_rq:/.min_vruntime.stddev
4336410 +4.5% 4531719 stress-ng.sockmany.ops
71830 +4.5% 75040 stress-ng.sockmany.ops_per_sec
7490830 ± 5% +5.6% 7912072 stress-ng.time.voluntary_context_switches
688478 ± 2% -22.0% 537116 ± 3% perf-c2c.DRAM.local
612983 -19.5% 493390 ± 3% perf-c2c.DRAM.remote
22430 ± 2% +873.5% 218364 ± 11% perf-c2c.HITM.local
23141 ± 2% +846.6% 219069 ± 11% perf-c2c.HITM.total
40.09 ± 4% -17.0% 33.28 ± 3% perf-stat.i.MPKI
1.398e+10 ± 4% +17.0% 1.636e+10 perf-stat.i.branch-instructions
2.26 -0.1 2.14 perf-stat.i.branch-miss-rate%
3.091e+08 ± 4% +12.1% 3.467e+08 perf-stat.i.branch-misses
76.11 ± 3% -9.4 66.74 ± 3% perf-stat.i.cache-miss-rate%
3.694e+09 ± 4% +10.9% 4.096e+09 perf-stat.i.cache-references
8.50 ± 3% -11.6% 7.52 perf-stat.i.cpi
7.47e+10 ± 4% +16.5% 8.706e+10 perf-stat.i.instructions
38.96 -18.1% 31.93 ± 3% perf-stat.overall.MPKI
2.21 -0.1 2.12 perf-stat.overall.branch-miss-rate%
78.85 -11.0 67.89 ± 3% perf-stat.overall.cache-miss-rate%
8.30 -12.5% 7.27 perf-stat.overall.cpi
213.10 +6.9% 227.81 ± 2% perf-stat.overall.cycles-between-cache-misses
0.12 +14.3% 0.14 perf-stat.overall.ipc
1.375e+10 ± 4% +17.0% 1.609e+10 perf-stat.ps.branch-instructions
3.04e+08 ± 4% +12.2% 3.41e+08 perf-stat.ps.branch-misses
3.632e+09 ± 4% +10.9% 4.028e+09 perf-stat.ps.cache-references
204551 ± 9% +6.8% 218435 perf-stat.ps.context-switches
7.349e+10 ± 4% +16.5% 8.565e+10 perf-stat.ps.instructions
4.651e+12 +14.6% 5.328e+12 perf-stat.total.instructions
1.22 ±111% -98.2% 0.02 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
0.55 ± 11% -39.5% 0.33 ± 43% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
1.22 ±111% -96.5% 0.04 ±223% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
3.87 ± 83% +389.9% 18.96 ± 87% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
8.70 ± 30% +271.2% 32.31 ±109% perf-sched.sch_delay.max.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
3.84 ± 5% +516.8% 23.70 ± 82% perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
15.53 -13.3% 13.47 ± 2% perf-sched.total_wait_and_delay.average.ms
234871 +16.6% 273899 perf-sched.total_wait_and_delay.count.ms
15.48 -13.3% 13.42 ± 2% perf-sched.total_wait_time.average.ms
808.31 ± 27% -42.5% 464.90 ± 49% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
135.87 ± 16% -38.9% 83.00 ± 7% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
10.11 -14.0% 8.69 perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
4.05 ± 3% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
103599 +16.3% 120485 perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
93.17 ± 19% +67.4% 156.00 ± 7% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
109023 ± 2% +17.2% 127816 perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
1230 ± 3% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
15.55 ±106% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
9.98 -13.0% 8.68 perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
808.30 ± 27% -42.5% 464.89 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1.81 ± 67% +8639.1% 157.80 ±217% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
135.32 ± 16% -38.9% 82.67 ± 6% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
10.09 -14.0% 8.68 perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
0.03 ± 90% +1.1e+06% 372.59 ±111% perf-sched.wait_time.max.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap_unlocked.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
5.22 ± 70% +15512.3% 815.30 ±205% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists