[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <202503102159.5f78c207-lkp@intel.com>
Date: Mon, 10 Mar 2025 22:03:06 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <netdev@...r.kernel.org>,
"David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Neal Cardwell <ncardwell@...gle.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>, Jason Xing
<kerneljasonxing@...il.com>, Simon Horman <horms@...nel.org>,
<eric.dumazet@...il.com>, Eric Dumazet <edumazet@...gle.com>,
<oliver.sang@...el.com>
Subject: Re: [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
Hello,
kernel test robot noticed a 6.9% improvement of stress-ng.sockmany.ops_per_sec on:
commit: ba6c94b99d772f431fd589dd2cd606b59063557b ("[PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()")
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/tcp-use-RCU-in-__inet-6-_check_established/20250302-204711
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git f77f12010f67259bd0e1ad18877ed27c721b627a
patch link: https://lore.kernel.org/all/20250302124237.3913746-5-edumazet@google.com/
patch subject: [PATCH net-next 4/4] tcp: use RCU lookup in __inet_hash_connect()
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: sockmany
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250310/202503102159.5f78c207-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s
commit:
4f97f75a5b ("tcp: add RCU management to inet_bind_bucket")
ba6c94b99d ("tcp: use RCU lookup in __inet_hash_connect()")
4f97f75a5bfa79ba ba6c94b99d772f431fd589dd2cd
---------------- ---------------------------
%stddev %change %stddev
\ | \
1742139 ± 89% -91.6% 146373 ± 56% numa-meminfo.node1.Unevictable
0.61 ± 3% +0.1 0.71 ± 3% mpstat.cpu.all.irq%
0.42 +0.0 0.46 ± 2% mpstat.cpu.all.usr%
435534 ± 89% -91.6% 36593 ± 56% numa-vmstat.node1.nr_unevictable
435534 ± 89% -91.6% 36593 ± 56% numa-vmstat.node1.nr_zone_unevictable
4057584 +7.0% 4340521 stress-ng.sockmany.ops
67264 +6.9% 71933 stress-ng.sockmany.ops_per_sec
604900 +12.3% 679404 ± 4% perf-c2c.DRAM.local
42998 ± 2% -55.7% 19034 ± 3% perf-c2c.HITM.local
13764 ± 4% -95.2% 663.67 ± 13% perf-c2c.HITM.remote
56762 ± 2% -65.3% 19698 ± 4% perf-c2c.HITM.total
7422009 +13.2% 8403980 ± 2% sched_debug.cfs_rq:/.avg_vruntime.max
195564 ± 5% +62.7% 318178 ± 10% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.23 ± 7% +25.4% 0.29 ± 4% sched_debug.cfs_rq:/.h_nr_queued.stddev
39935 ± 4% +27.0% 50726 ± 29% sched_debug.cfs_rq:/.load_avg.max
7422009 +13.2% 8403980 ± 2% sched_debug.cfs_rq:/.min_vruntime.max
195564 ± 5% +62.7% 318178 ± 10% sched_debug.cfs_rq:/.min_vruntime.stddev
0.23 ± 6% +26.6% 0.29 ± 4% sched_debug.cpu.nr_running.stddev
387640 +5.9% 410501 ± 9% proc-vmstat.nr_active_anon
109911 ± 2% +8.5% 119206 ± 2% proc-vmstat.nr_mapped
200627 +1.9% 204454 proc-vmstat.nr_shmem
895041 +4.9% 939289 proc-vmstat.nr_slab_reclaimable
2982921 +5.0% 3131084 proc-vmstat.nr_slab_unreclaimable
387640 +5.9% 410501 ± 9% proc-vmstat.nr_zone_active_anon
2071760 +2.0% 2112591 proc-vmstat.numa_hit
1839824 +2.2% 1880606 proc-vmstat.numa_local
5905025 +5.2% 6210697 proc-vmstat.pgalloc_normal
5291411 ± 12% +11.9% 5921072 proc-vmstat.pgfree
0.82 ± 13% -29.0% 0.58 ± 6% perf-sched.sch_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
4.50 ± 16% +29.5% 5.83 ± 15% perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
0.03 ± 56% -88.8% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
0.07 ±125% +3754.0% 2.67 ± 71% perf-sched.sch_delay.max.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap_unlocked.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
19.83 -22.3% 15.41 perf-sched.total_wait_and_delay.average.ms
177991 +32.7% 236147 perf-sched.total_wait_and_delay.count.ms
19.76 -22.3% 15.35 perf-sched.total_wait_time.average.ms
1.64 ± 12% -28.9% 1.17 ± 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
13.69 -26.2% 10.10 perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
6844 +11.8% 7651 ± 3% perf-sched.wait_and_delay.count.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
78701 +33.6% 105168 perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
81026 +35.2% 109539 perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
2268 ± 14% +90.6% 4325 ± 6% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
0.82 ± 12% -28.6% 0.59 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
13.49 -26.5% 9.91 perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
3.05 ± 3% +16.5% 3.55 ± 3% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
30.10 ± 20% -64.4% 10.72 ±113% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
1.14 ± 9% +22.2% 1.40 ± 7% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
13.67 -26.3% 10.08 perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
7.36 ± 57% +103.9% 15.01 ± 27% perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.03 ± 56% -88.8% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
0.07 ±125% +4e+05% 275.31 ±115% perf-sched.wait_time.max.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap_unlocked.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
35.70 +15.3% 41.18 perf-stat.i.MPKI
1.368e+10 +4.6% 1.431e+10 perf-stat.i.branch-instructions
2.15 +0.1 2.27 perf-stat.i.branch-miss-rate%
2.884e+08 +10.7% 3.192e+08 perf-stat.i.branch-misses
71.62 +5.5 77.09 perf-stat.i.cache-miss-rate%
2.377e+09 +26.3% 3.003e+09 perf-stat.i.cache-misses
3.264e+09 +17.4% 3.832e+09 perf-stat.i.cache-references
9.40 -8.1% 8.64 perf-stat.i.cpi
292.27 -18.0% 239.70 perf-stat.i.cycles-between-cache-misses
6.963e+10 +9.8% 7.645e+10 perf-stat.i.instructions
0.12 ± 2% +7.3% 0.13 perf-stat.i.ipc
34.12 +15.0% 39.25 perf-stat.overall.MPKI
2.11 +0.1 2.23 perf-stat.overall.branch-miss-rate%
72.81 +5.5 78.36 perf-stat.overall.cache-miss-rate%
9.07 -8.4% 8.31 perf-stat.overall.cpi
265.92 -20.4% 211.72 perf-stat.overall.cycles-between-cache-misses
0.11 +9.2% 0.12 perf-stat.overall.ipc
1.345e+10 +4.6% 1.408e+10 perf-stat.ps.branch-instructions
2.835e+08 +10.7% 3.139e+08 perf-stat.ps.branch-misses
2.337e+09 +26.3% 2.952e+09 perf-stat.ps.cache-misses
3.209e+09 +17.4% 3.768e+09 perf-stat.ps.cache-references
6.849e+10 +9.8% 7.521e+10 perf-stat.ps.instructions
4.236e+12 +9.1% 4.621e+12 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists