[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202506102156.1d2bde14-lkp@intel.com>
Date: Tue, 10 Jun 2025 21:57:53 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
Jakub Kicinski <kuba@...nel.org>, Jason Xing <kerneljasonxing@...il.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>, <netdev@...r.kernel.org>,
<oliver.sang@...el.com>
Subject: [linus:master] [tcp] 86c2bc293b: stress-ng.sockmany.ops_per_sec
6.8% improvement
Hello,
kernel test robot noticed a 6.8% improvement of stress-ng.sockmany.ops_per_sec on:
commit: 86c2bc293b8130aec9fa504e953531a84a6eb9a6 ("tcp: use RCU lookup in __inet_hash_connect()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: sockmany
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250610/202506102156.1d2bde14-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s
commit:
d186f405fd ("tcp: add RCU management to inet_bind_bucket")
86c2bc293b ("tcp: use RCU lookup in __inet_hash_connect()")
d186f405fdf4229d 86c2bc293b8130aec9fa504e953
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.62 ± 3% +0.1 0.69 ± 2% mpstat.cpu.all.irq%
521879 -1.5% 514052 vmstat.system.in
4059292 +6.8% 4335271 stress-ng.sockmany.ops
67315 +6.8% 71863 stress-ng.sockmany.ops_per_sec
903062 +4.0% 939576 proc-vmstat.nr_slab_reclaimable
5715333 +5.7% 6043532 proc-vmstat.pgfree
30955 ± 4% -5.6% 29223 ± 3% proc-vmstat.pgreuse
617802 +12.5% 694736 ± 2% perf-c2c.DRAM.local
43535 ± 2% -55.2% 19524 ± 2% perf-c2c.HITM.local
13760 ± 4% -94.7% 726.83 ± 9% perf-c2c.HITM.remote
57296 ± 3% -64.7% 20251 ± 2% perf-c2c.HITM.total
4862651 ± 23% +26.2% 6137833 ± 6% sched_debug.cfs_rq:/.avg_vruntime.min
0.24 ± 6% +23.8% 0.30 ± 5% sched_debug.cfs_rq:/.h_nr_queued.stddev
4862651 ± 23% +26.2% 6137833 ± 6% sched_debug.cfs_rq:/.min_vruntime.min
0.24 ± 6% +23.3% 0.30 ± 6% sched_debug.cpu.nr_running.stddev
40590 ± 3% +18.8% 48233 ± 17% sched_debug.cpu.nr_switches.max
0.63 ± 12% +20.6% 0.76 ± 7% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.32 ± 10% -41.2% 0.19 ± 18% perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
0.19 ±195% +772.8% 1.62 ± 82% perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.30 ± 31% +51.8% 3.49 ± 12% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
20.10 -23.3% 15.41 perf-sched.total_wait_and_delay.average.ms
177307 +32.5% 234941 perf-sched.total_wait_and_delay.count.ms
20.04 -23.4% 15.36 perf-sched.total_wait_time.average.ms
125.96 ±110% -73.3% 33.69 ± 17% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
13.68 -25.7% 10.16 perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
0.65 ± 10% -41.0% 0.38 ± 18% perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
79042 +32.2% 104463 perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
81037 +34.4% 108937 perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
1965 ± 9% +125.3% 4427 ± 3% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
2427 ± 3% +12.5% 2729 ± 2% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
13.36 ± 2% -25.0% 10.02 perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
13.66 -25.7% 10.15 perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
0.33 ± 10% -40.8% 0.19 ± 18% perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
35.56 +15.4% 41.03 perf-stat.i.MPKI
1.386e+10 +3.1% 1.428e+10 perf-stat.i.branch-instructions
2.15 +0.1 2.26 perf-stat.i.branch-miss-rate%
2.923e+08 +8.8% 3.182e+08 perf-stat.i.branch-misses
71.48 +5.8 77.26 perf-stat.i.cache-miss-rate%
2.391e+09 +24.9% 2.985e+09 perf-stat.i.cache-misses
3.296e+09 +15.3% 3.802e+09 perf-stat.i.cache-references
9.36 -7.4% 8.66 perf-stat.i.cpi
291.67 -17.3% 241.22 perf-stat.i.cycles-between-cache-misses
7.053e+10 +8.2% 7.631e+10 perf-stat.i.instructions
0.12 +7.1% 0.13 perf-stat.i.ipc
34.03 +14.9% 39.11 perf-stat.overall.MPKI
2.11 +0.1 2.23 perf-stat.overall.branch-miss-rate%
72.58 +5.9 78.51 perf-stat.overall.cache-miss-rate%
9.04 -7.8% 8.34 perf-stat.overall.cpi
265.78 -19.8% 213.18 perf-stat.overall.cycles-between-cache-misses
0.11 +8.5% 0.12 perf-stat.overall.ipc
1.359e+10 +3.4% 1.405e+10 perf-stat.ps.branch-instructions
2.863e+08 +9.3% 3.129e+08 perf-stat.ps.branch-misses
2.353e+09 +24.7% 2.935e+09 perf-stat.ps.cache-misses
3.242e+09 +15.3% 3.739e+09 perf-stat.ps.cache-references
6.915e+10 +8.5% 7.506e+10 perf-stat.ps.instructions
4.246e+12 +8.2% 4.596e+12 perf-stat.total.instructions
66.41 ± 70% -49.8 16.57 ±223% perf-profile.calltrace.cycles-pp.stress_sockmany
66.32 ± 70% -49.8 16.54 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.connect.stress_sockmany
66.32 ± 70% -49.8 16.54 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.connect.stress_sockmany
66.32 ± 70% -49.8 16.54 ±223% perf-profile.calltrace.cycles-pp.connect.stress_sockmany
66.31 ± 70% -49.8 16.54 ±223% perf-profile.calltrace.cycles-pp.__sys_connect.__x64_sys_connect.do_syscall_64.entry_SYSCALL_64_after_hwframe.connect
66.31 ± 70% -49.8 16.54 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_connect.do_syscall_64.entry_SYSCALL_64_after_hwframe.connect.stress_sockmany
66.31 ± 70% -49.8 16.54 ±223% perf-profile.calltrace.cycles-pp.__inet_stream_connect.inet_stream_connect.__sys_connect.__x64_sys_connect.do_syscall_64
66.31 ± 70% -49.8 16.54 ±223% perf-profile.calltrace.cycles-pp.inet_stream_connect.__sys_connect.__x64_sys_connect.do_syscall_64.entry_SYSCALL_64_after_hwframe
66.25 ± 70% -49.7 16.52 ±223% perf-profile.calltrace.cycles-pp.tcp_v4_connect.__inet_stream_connect.inet_stream_connect.__sys_connect.__x64_sys_connect
66.09 ± 70% -49.6 16.48 ±223% perf-profile.calltrace.cycles-pp.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect.__sys_connect
54.17 ± 70% -38.3 15.86 ±223% perf-profile.calltrace.cycles-pp.__inet_check_established.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
10.32 ± 70% -10.3 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_bh.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
4.67 ± 70% -4.7 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_bh.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect
66.53 ± 70% -49.9 16.60 ±223% perf-profile.children.cycles-pp.do_syscall_64
66.53 ± 70% -49.9 16.60 ±223% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
66.41 ± 70% -49.8 16.57 ±223% perf-profile.children.cycles-pp.stress_sockmany
66.33 ± 70% -49.8 16.54 ±223% perf-profile.children.cycles-pp.connect
66.31 ± 70% -49.8 16.54 ±223% perf-profile.children.cycles-pp.__inet_stream_connect
66.31 ± 70% -49.8 16.54 ±223% perf-profile.children.cycles-pp.__sys_connect
66.31 ± 70% -49.8 16.54 ±223% perf-profile.children.cycles-pp.__x64_sys_connect
66.31 ± 70% -49.8 16.54 ±223% perf-profile.children.cycles-pp.inet_stream_connect
66.25 ± 70% -49.7 16.52 ±223% perf-profile.children.cycles-pp.tcp_v4_connect
66.21 ± 70% -49.7 16.50 ±223% perf-profile.children.cycles-pp.__inet_hash_connect
54.25 ± 70% -38.4 15.89 ±223% perf-profile.children.cycles-pp.__inet_check_established
10.37 ± 70% -10.4 0.00 perf-profile.children.cycles-pp._raw_spin_lock_bh
4.67 ± 70% -4.7 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
53.42 ± 70% -37.8 15.58 ±223% perf-profile.self.cycles-pp.__inet_check_established
5.65 ± 70% -5.6 0.00 perf-profile.self.cycles-pp._raw_spin_lock_bh
4.62 ± 70% -4.6 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists