lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <202503171623.f2e16b60-lkp@intel.com>
Date: Mon, 17 Mar 2025 21:44:54 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <netdev@...r.kernel.org>,
	"David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>, Neal Cardwell <ncardwell@...gle.com>,
	Kuniyuki Iwashima <kuniyu@...zon.com>, Jason Xing <kernelxing@...cent.com>,
	Simon Horman <horms@...nel.org>, <eric.dumazet@...il.com>, Eric Dumazet
	<edumazet@...gle.com>, <oliver.sang@...el.com>
Subject: Re: [PATCH net-next 1/2] inet: change lport contribution to
 inet_ehashfn() and inet6_ehashfn()



Hello,

kernel test robot noticed a 26.0% improvement of stress-ng.sockmany.ops_per_sec on:


commit: 265acc444f8a96246e9d42b54b6931d078034218 ("[PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/inet-change-lport-contribution-to-inet_ehashfn-and-inet6_ehashfn/20250305-114734
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git f252f23ab657cd224cb8334ba69966396f3f629b
patch link: https://lore.kernel.org/all/20250305034550.879255-2-edumazet@google.com/
patch subject: [PATCH net-next 1/2] inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sockmany
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.sockmany.ops_per_sec 4.4% improvement                                  |
| test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters  | cpufreq_governor=performance                                                                |
|                  | nr_threads=100%                                                                             |
|                  | test=sockmany                                                                               |
|                  | testtime=60s                                                                                |
+------------------+---------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250317/202503171623.f2e16b60-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/sockmany/stress-ng/60s

commit: 
  f252f23ab6 ("net: Prevent use after free in netif_napi_set_irq_locked()")
  265acc444f ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")

f252f23ab657cd22 265acc444f8a96246e9d42b54b6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.60 ±  6%      +0.2        0.75 ±  6%  mpstat.cpu.all.soft%
    376850 ±  9%     +15.7%     436068 ±  9%  numa-numastat.node0.local_node
    376612 ±  9%     +15.8%     435968 ±  9%  numa-vmstat.node0.numa_local
     54708           +22.0%      66753 ±  2%  vmstat.system.cs
      2308         +1167.7%      29267 ± 26%  perf-c2c.HITM.local
      2499         +1078.3%      29447 ± 26%  perf-c2c.HITM.total
      1413 ±  8%     -13.8%       1218 ±  4%  sched_debug.cfs_rq:/.runnable_avg.max
     28302           +21.2%      34303 ±  2%  sched_debug.cpu.nr_switches.avg
     39625 ±  6%     +63.4%      64761 ±  6%  sched_debug.cpu.nr_switches.max
      4170 ±  9%    +126.1%       9429 ±  8%  sched_debug.cpu.nr_switches.stddev
   1606932           +25.9%    2023746 ±  3%  stress-ng.sockmany.ops
     26687           +26.0%      33624 ±  3%  stress-ng.sockmany.ops_per_sec
   1561801           +28.1%    2000939 ±  3%  stress-ng.time.involuntary_context_switches
   1731525           +22.3%    2118259 ±  2%  stress-ng.time.voluntary_context_switches
     84783            +2.6%      86953        proc-vmstat.nr_shmem
      5339 ±  6%     -26.4%       3931 ± 16%  proc-vmstat.numa_hint_faults_local
    878479            +6.8%     937819        proc-vmstat.numa_hit
    812262            +7.3%     871615        proc-vmstat.numa_local
   2550690           +12.5%    2870404        proc-vmstat.pgalloc_normal
   2407108           +13.2%    2724922        proc-vmstat.pgfree
     21.96           -17.2%      18.18 ±  2%  perf-stat.i.MPKI
 7.517e+09           +18.8%  8.933e+09        perf-stat.i.branch-instructions
      2.70            -0.7        1.96        perf-stat.i.branch-miss-rate%
  2.03e+08           -13.1%  1.765e+08        perf-stat.i.branch-misses
     60.22            -2.3       57.89 ±  2%  perf-stat.i.cache-miss-rate%
 1.472e+09            +4.7%  1.542e+09        perf-stat.i.cache-references
     56669           +22.3%      69301 ±  2%  perf-stat.i.context-switches
      5.56           -18.4%       4.53 ±  2%  perf-stat.i.cpi
  4.24e+10           +19.2%  5.054e+10        perf-stat.i.instructions
      0.20           +20.1%       0.24 ±  4%  perf-stat.i.ipc
      0.49           +21.0%       0.60 ±  8%  perf-stat.i.metric.K/sec
     21.03           -15.1%      17.85        perf-stat.overall.MPKI
      2.70            -0.7        1.98        perf-stat.overall.branch-miss-rate%
     60.56            -2.1       58.49        perf-stat.overall.cache-miss-rate%
      5.34           -16.6%       4.45        perf-stat.overall.cpi
    253.77            -1.7%     249.50        perf-stat.overall.cycles-between-cache-misses
      0.19           +19.9%       0.22        perf-stat.overall.ipc
 7.395e+09           +18.9%  8.789e+09        perf-stat.ps.branch-instructions
 1.997e+08           -13.0%  1.737e+08        perf-stat.ps.branch-misses
 1.448e+09            +4.7%  1.517e+09        perf-stat.ps.cache-references
     55820           +22.2%      68204 ±  2%  perf-stat.ps.context-switches
 4.172e+10           +19.2%  4.972e+10        perf-stat.ps.instructions
 2.556e+12           +20.2%  3.072e+12 ±  2%  perf-stat.total.instructions
      0.35 ±  9%     -14.9%       0.29 ±  6%  perf-sched.sch_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
      0.06 ±  7%     -20.5%       0.04 ±  4%  perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
      0.16 ±218%    +798.3%       1.44 ± 40%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
      0.25 ±152%    +291.3%       0.99 ± 45%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
      0.11 ±166%    +568.2%       0.75 ± 45%  perf-sched.sch_delay.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
      0.84 ± 14%     +39.2%       1.17 ±  9%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.11 ± 22%    +108.5%       0.23 ± 12%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      0.08 ± 59%     -60.0%       0.03 ±  4%  perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      0.16 ±218%   +1286.4%       2.22 ± 25%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
      0.13 ±153%    +910.1%       1.27 ± 34%  perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
      9.23           -12.5%       8.08        perf-sched.total_wait_and_delay.average.ms
    139892           +15.3%     161338        perf-sched.total_wait_and_delay.count.ms
      9.18           -12.5%       8.03        perf-sched.total_wait_time.average.ms
      0.70 ±  8%     -14.5%       0.60 ±  6%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
      0.11 ±  8%     -20.1%       0.09 ±  4%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
    429.48 ± 44%     +63.6%     702.60 ± 11%  perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      4.97           -14.0%       4.28        perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      0.23 ± 21%    +104.2%       0.46 ± 12%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
     48576 ±  5%     +36.3%      66215 ±  2%  perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
     81.83            +9.8%      89.83 ±  2%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     64098           +16.3%      74560        perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
     15531 ± 17%     -46.2%       8355 ±  6%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      0.36 ±  8%     -14.2%       0.31 ±  6%  perf-sched.wait_time.avg.ms.__cond_resched.__inet_hash_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
      0.06 ±  7%     -20.2%       0.04 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
      0.04 ±178%     -94.4%       0.00 ±130%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.16 ±218%    +798.5%       1.44 ± 40%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
      0.11 ±166%    +568.6%       0.75 ± 45%  perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect
    427.69 ± 45%     +63.1%     697.48 ± 10%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      4.95           -14.0%       4.26        perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      0.12 ± 20%     +99.9%       0.23 ± 12%  perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      0.16 ±218%   +1286.4%       2.22 ± 25%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.sock_alloc_file
      0.13 ±153%    +911.4%       1.27 ± 34%  perf-sched.wait_time.max.ms.__cond_resched.lock_sock_nested.inet_stream_connect.__sys_connect.__x64_sys_connect


***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockmany/stress-ng/60s

commit: 
  f252f23ab6 ("net: Prevent use after free in netif_napi_set_irq_locked()")
  265acc444f ("inet: change lport contribution to inet_ehashfn() and inet6_ehashfn()")

f252f23ab657cd22 265acc444f8a96246e9d42b54b6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    205766            +3.2%     212279        vmstat.system.cs
    309724 ±  5%     +63.6%     506684 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.stddev
    309724 ±  5%     +63.6%     506684 ±  9%  sched_debug.cfs_rq:/.min_vruntime.stddev
   1307371 ±  8%     -14.5%    1117523 ±  7%  sched_debug.cpu.avg_idle.max
   4333131            +4.4%    4525951        stress-ng.sockmany.ops
     71816            +4.4%      74988        stress-ng.sockmany.ops_per_sec
   7639150            +3.6%    7910527        stress-ng.time.voluntary_context_switches
    693603           -18.6%     564616 ±  3%  perf-c2c.DRAM.local
    611374           -16.8%     508688 ±  2%  perf-c2c.DRAM.remote
     19509          +994.2%     213470 ±  7%  perf-c2c.HITM.local
     20252          +957.6%     214187 ±  7%  perf-c2c.HITM.total
    204521            +3.1%     210765        proc-vmstat.nr_shmem
    938137            +2.9%     965493        proc-vmstat.nr_slab_reclaimable
   3102658            +3.0%    3196837        proc-vmstat.nr_slab_unreclaimable
   2113801            +1.8%    2151131        proc-vmstat.numa_hit
   1881174            +2.0%    1919223        proc-vmstat.numa_local
   6186586            +3.6%    6406837        proc-vmstat.pgalloc_normal
      0.76 ± 46%     -83.0%       0.13 ±144%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      0.02 ±  2%      -6.3%       0.02 ±  2%  perf-sched.sch_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
     15.43           -12.6%      13.48        perf-sched.total_wait_and_delay.average.ms
    234971           +15.6%     271684        perf-sched.total_wait_and_delay.count.ms
     15.37           -12.6%      13.43        perf-sched.total_wait_time.average.ms
    140.18 ±  5%     -37.2%      88.02 ± 11%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     10.17           -14.1%       8.74        perf-sched.wait_and_delay.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      4.02          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    104089           +16.4%     121193        perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.__inet_stream_connect.inet_stream_connect
     88.17 ±  6%     +68.1%     148.17 ± 13%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    108724           +16.8%     127034        perf-sched.wait_and_delay.count.schedule_timeout.inet_csk_accept.inet_accept.do_accept
      1232          -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      4592 ± 12%     +26.1%       5792 ± 14%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
     11.29 ± 68%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      9.99           -13.3%       8.66        perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
    139.53 ±  6%     -37.2%      87.60 ± 11%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     10.15           -14.1%       8.72        perf-sched.wait_time.avg.ms.schedule_timeout.inet_csk_accept.inet_accept.do_accept
     41.10           -17.2%      34.03        perf-stat.i.MPKI
 1.424e+10           +14.6%  1.631e+10        perf-stat.i.branch-instructions
      2.28            -0.1        2.17        perf-stat.i.branch-miss-rate%
 3.193e+08            +9.4%  3.492e+08        perf-stat.i.branch-misses
     77.01            -9.5       67.48        perf-stat.i.cache-miss-rate%
 2.981e+09            -5.1%   2.83e+09        perf-stat.i.cache-misses
 3.806e+09            +8.4%  4.127e+09        perf-stat.i.cache-references
    217129            +3.2%     224056        perf-stat.i.context-switches
      8.68           -12.7%       7.58        perf-stat.i.cpi
    242.24            +4.0%     251.97        perf-stat.i.cycles-between-cache-misses
 7.608e+10           +14.1%  8.679e+10        perf-stat.i.instructions
      0.13           +13.3%       0.15        perf-stat.i.ipc
     39.15           -16.8%      32.58        perf-stat.overall.MPKI
      2.24            -0.1        2.14        perf-stat.overall.branch-miss-rate%
     78.30            -9.7       68.56        perf-stat.overall.cache-miss-rate%
      8.35           -12.4%       7.31        perf-stat.overall.cpi
    213.17            +5.3%     224.53        perf-stat.overall.cycles-between-cache-misses
      0.12           +14.1%       0.14        perf-stat.overall.ipc
 1.401e+10           +14.6%  1.604e+10        perf-stat.ps.branch-instructions
 3.139e+08            +9.4%  3.434e+08        perf-stat.ps.branch-misses
 2.931e+09            -5.1%  2.782e+09        perf-stat.ps.cache-misses
 3.743e+09            +8.4%  4.058e+09        perf-stat.ps.cache-references
    213541            +3.3%     220574        perf-stat.ps.context-switches
 7.485e+10           +14.1%  8.539e+10        perf-stat.ps.instructions
 4.597e+12           +13.9%  5.235e+12        perf-stat.total.instructions





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ