lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202506102156.1d2bde14-lkp@intel.com>
Date: Tue, 10 Jun 2025 21:57:53 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	Jakub Kicinski <kuba@...nel.org>, Jason Xing <kerneljasonxing@...il.com>,
	Kuniyuki Iwashima <kuniyu@...zon.com>, <netdev@...r.kernel.org>,
	<oliver.sang@...el.com>
Subject: [linus:master] [tcp]  86c2bc293b:  stress-ng.sockmany.ops_per_sec
 6.8% improvement



Hello,

kernel test robot noticed a 6.8% improvement of stress-ng.sockmany.ops_per_sec on:


commit: 86c2bc293b8130aec9fa504e953531a84a6eb9a6 ("tcp: use RCU lookup in __inet_hash_connect()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sockmany
	cpufreq_governor: performance




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250610/202506102156.1d2bde14-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s

commit: 
  d186f405fd ("tcp: add RCU management to inet_bind_bucket")
  86c2bc293b ("tcp: use RCU lookup in __inet_hash_connect()")

d186f405fdf4229d 86c2bc293b8130aec9fa504e953 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.62 ±  3%      +0.1        0.69 ±  2%  mpstat.cpu.all.irq%
    521879            -1.5%     514052        vmstat.system.in
   4059292            +6.8%    4335271        stress-ng.sockmany.ops
     67315            +6.8%      71863        stress-ng.sockmany.ops_per_sec
    903062            +4.0%     939576        proc-vmstat.nr_slab_reclaimable
   5715333            +5.7%    6043532        proc-vmstat.pgfree
     30955 ±  4%      -5.6%      29223 ±  3%  proc-vmstat.pgreuse
    617802           +12.5%     694736 ±  2%  perf-c2c.DRAM.local
     43535 ±  2%     -55.2%      19524 ±  2%  perf-c2c.HITM.local
     13760 ±  4%     -94.7%     726.83 ±  9%  perf-c2c.HITM.remote
     57296 ±  3%     -64.7%      20251 ±  2%  perf-c2c.HITM.total
   4862651 ± 23%     +26.2%    6137833 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.min
      0.24 ±  6%     +23.8%       0.30 ±  5%  sched_debug.cfs_rq:/.h_nr_queued.stddev
   4862651 ± 23%     +26.2%    6137833 ±  6%  sched_debug.cfs_rq:/.min_vruntime.min
      0.24 ±  6%     +23.3%       0.30 ±  6%  sched_debug.cpu.nr_running.stddev
     40590 ±  3%     +18.8%      48233 ± 17%  sched_debug.cpu.nr_switches.max
      0.63 ± 12%     +20.6%       0.76 ±  7%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.32 ± 10%     -41.2%       0.19 ± 18%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      0.19 ±195%    +772.8%       1.62 ± 82%  perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.30 ± 31%     +51.8%       3.49 ± 12%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
     20.10           -23.3%      15.41        perf-sched.total_wait_and_delay.average.ms
    177307           +32.5%     234941        perf-sched.total_wait_and_delay.count.ms
     20.04           -23.4%      15.36        perf-sched.total_wait_time.average.ms
    125.96 ±110%     -73.3%      33.69 ± 17%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     13.68           -25.7%      10.16        perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      0.65 ± 10%     -41.0%       0.38 ± 18%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
     79042           +32.2%     104463        perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
     81037           +34.4%     108937        perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      1965 ±  9%    +125.3%       4427 ±  3%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      2427 ±  3%     +12.5%       2729 ±  2%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     13.36 ±  2%     -25.0%      10.02        perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
     13.66           -25.7%      10.15        perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      0.33 ± 10%     -40.8%       0.19 ± 18%  perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
     35.56           +15.4%      41.03        perf-stat.i.MPKI
 1.386e+10            +3.1%  1.428e+10        perf-stat.i.branch-instructions
      2.15            +0.1        2.26        perf-stat.i.branch-miss-rate%
 2.923e+08            +8.8%  3.182e+08        perf-stat.i.branch-misses
     71.48            +5.8       77.26        perf-stat.i.cache-miss-rate%
 2.391e+09           +24.9%  2.985e+09        perf-stat.i.cache-misses
 3.296e+09           +15.3%  3.802e+09        perf-stat.i.cache-references
      9.36            -7.4%       8.66        perf-stat.i.cpi
    291.67           -17.3%     241.22        perf-stat.i.cycles-between-cache-misses
 7.053e+10            +8.2%  7.631e+10        perf-stat.i.instructions
      0.12            +7.1%       0.13        perf-stat.i.ipc
     34.03           +14.9%      39.11        perf-stat.overall.MPKI
      2.11            +0.1        2.23        perf-stat.overall.branch-miss-rate%
     72.58            +5.9       78.51        perf-stat.overall.cache-miss-rate%
      9.04            -7.8%       8.34        perf-stat.overall.cpi
    265.78           -19.8%     213.18        perf-stat.overall.cycles-between-cache-misses
      0.11            +8.5%       0.12        perf-stat.overall.ipc
 1.359e+10            +3.4%  1.405e+10        perf-stat.ps.branch-instructions
 2.863e+08            +9.3%  3.129e+08        perf-stat.ps.branch-misses
 2.353e+09           +24.7%  2.935e+09        perf-stat.ps.cache-misses
 3.242e+09           +15.3%  3.739e+09        perf-stat.ps.cache-references
 6.915e+10            +8.5%  7.506e+10        perf-stat.ps.instructions
 4.246e+12            +8.2%  4.596e+12        perf-stat.total.instructions
     66.41 ± 70%     -49.8       16.57 ±223%  perf-profile.calltrace.cycles-pp.stress_sockmany
     66.32 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.connect.stress_sockmany
     66.32 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.connect.stress_sockmany
     66.32 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.connect.stress_sockmany
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.__sys_connect.__x64_sys_connect.do_syscall_64.entry_SYSCALL_64_after_hwframe.connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.__x64_sys_connect.do_syscall_64.entry_SYSCALL_64_after_hwframe.connect.stress_sockmany
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.__inet_stream_connect.inet_stream_connect.__sys_connect.__x64_sys_connect.do_syscall_64
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.calltrace.cycles-pp.inet_stream_connect.__sys_connect.__x64_sys_connect.do_syscall_64.entry_SYSCALL_64_after_hwframe
     66.25 ± 70%     -49.7       16.52 ±223%  perf-profile.calltrace.cycles-pp.tcp_v4_connect.__inet_stream_connect.inet_stream_connect.__sys_connect.__x64_sys_connect
     66.09 ± 70%     -49.6       16.48 ±223%  perf-profile.calltrace.cycles-pp.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect.__sys_connect
     54.17 ± 70%     -38.3       15.86 ±223%  perf-profile.calltrace.cycles-pp.__inet_check_established.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
     10.32 ± 70%     -10.3        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_bh.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
      4.67 ± 70%      -4.7        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_bh.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect
     66.53 ± 70%     -49.9       16.60 ±223%  perf-profile.children.cycles-pp.do_syscall_64
     66.53 ± 70%     -49.9       16.60 ±223%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     66.41 ± 70%     -49.8       16.57 ±223%  perf-profile.children.cycles-pp.stress_sockmany
     66.33 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.__inet_stream_connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.__sys_connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.__x64_sys_connect
     66.31 ± 70%     -49.8       16.54 ±223%  perf-profile.children.cycles-pp.inet_stream_connect
     66.25 ± 70%     -49.7       16.52 ±223%  perf-profile.children.cycles-pp.tcp_v4_connect
     66.21 ± 70%     -49.7       16.50 ±223%  perf-profile.children.cycles-pp.__inet_hash_connect
     54.25 ± 70%     -38.4       15.89 ±223%  perf-profile.children.cycles-pp.__inet_check_established
     10.37 ± 70%     -10.4        0.00        perf-profile.children.cycles-pp._raw_spin_lock_bh
      4.67 ± 70%      -4.7        0.00        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     53.42 ± 70%     -37.8       15.58 ±223%  perf-profile.self.cycles-pp.__inet_check_established
      5.65 ± 70%      -5.6        0.00        perf-profile.self.cycles-pp._raw_spin_lock_bh
      4.62 ± 70%      -4.6        0.00        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ