lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202404012326.d995728e-oliver.sang@intel.com>
Date: Mon, 1 Apr 2024 23:25:11 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	Paolo Abeni <pabeni@...hat.com>, Guillaume Nault <gnault@...hat.com>,
	Kuniyuki Iwashima <kuniyu@...zon.com>, Willem de Bruijn <willemb@...gle.com>,
	<netdev@...r.kernel.org>, <ying.huang@...el.com>, <feng.tang@...el.com>,
	<fengwei.yin@...el.com>, <oliver.sang@...el.com>
Subject: [linus:master] [sock_diag]  f44e64990b:
 stress-ng.sockdiag.ops_per_sec 147.0% improvement



Hello,

kernel test robot noticed a 147.0% improvement of stress-ng.sockdiag.ops_per_sec on:


commit: f44e64990beb41167bd7c313d90bcf7e290c3582 ("sock_diag: remove sock_diag_mutex")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: sockdiag
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240401/202404012326.d995728e-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/sockdiag/stress-ng/60s

commit: 
  86e8921df0 ("sock_diag: allow concurrent operation in sock_diag_rcv_msg()")
  f44e64990b ("sock_diag: remove sock_diag_mutex")

86e8921df05c6e94 f44e64990beb41167bd7c313d90 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      6805 ± 37%    +630.7%      49725 ±137%  numa-meminfo.node0.Active
      6767 ± 37%    +634.2%      49687 ±138%  numa-meminfo.node0.Active(anon)
      7690          +906.4%      77394 ± 48%  vmstat.system.cs
    420471            +6.2%     446552        vmstat.system.in
 2.491e+08          +147.0%  6.154e+08 ± 40%  stress-ng.sockdiag.ops
   4152375          +147.0%   10257279 ± 40%  stress-ng.sockdiag.ops_per_sec
     86849          +350.0%     390836 ± 28%  stress-ng.time.involuntary_context_switches
      0.55            -0.3        0.30 ± 20%  mpstat.cpu.all.irq%
      0.10 ±  3%      +0.0        0.15 ± 22%  mpstat.cpu.all.soft%
      0.46            +0.1        0.54 ±  2%  mpstat.cpu.all.usr%
     46.33 ± 12%     -84.2%       7.33 ± 84%  mpstat.max_utilization.seconds
   2234616 ±  2%    +136.2%    5279086 ± 37%  numa-numastat.node0.local_node
   2378097          +124.6%    5342166 ± 36%  numa-numastat.node0.numa_hit
   2678667 ±  2%    +108.5%    5584120 ± 35%  numa-numastat.node1.local_node
   2768310 ±  3%    +107.9%    5755443 ± 34%  numa-numastat.node1.numa_hit
   1211899           +13.3%    1372481 ±  2%  meminfo.Inactive
   1211695           +13.3%    1372284 ±  2%  meminfo.Inactive(anon)
    540274           +25.9%     680362 ±  7%  meminfo.Mapped
    449208            +8.6%     487827 ±  7%  meminfo.SUnreclaim
    862353           +23.0%    1060355 ±  3%  meminfo.Shmem
    161.00 ± 21%    +579.5%       1094 ± 64%  perf-c2c.DRAM.local
      1480 ± 15%    +661.4%      11271 ± 57%  perf-c2c.DRAM.remote
      1391 ± 14%   +1182.4%      17843 ± 65%  perf-c2c.HITM.local
    585.00 ± 10%   +1199.8%       7604 ± 59%  perf-c2c.HITM.remote
      1976 ± 13%   +1187.6%      25447 ± 63%  perf-c2c.HITM.total
    965151 ±  3%     -47.0%     511917 ±  6%  sched_debug.cpu.avg_idle.avg
    225203 ± 48%     -84.8%      34261 ±130%  sched_debug.cpu.avg_idle.min
      1759 ±  6%    +542.3%      11302 ± 45%  sched_debug.cpu.nr_switches.avg
    899.42          +738.6%       7542 ± 42%  sched_debug.cpu.nr_switches.min
    -30.17          +221.8%     -97.08        sched_debug.cpu.nr_uninterruptible.min
      1739 ± 37%    +612.9%      12403 ±138%  numa-vmstat.node0.nr_active_anon
      1739 ± 37%    +612.9%      12403 ±138%  numa-vmstat.node0.nr_zone_active_anon
   2377796          +124.5%    5337172 ± 36%  numa-vmstat.node0.numa_hit
   2234316 ±  2%    +136.0%    5274091 ± 37%  numa-vmstat.node0.numa_local
   2767474 ±  3%    +107.8%    5750481 ± 34%  numa-vmstat.node1.numa_hit
   2677832 ±  2%    +108.3%    5579160 ± 35%  numa-vmstat.node1.numa_local
    980143            +5.0%    1028901        proc-vmstat.nr_file_pages
    303091           +13.2%     342957 ±  2%  proc-vmstat.nr_inactive_anon
     40864            +1.6%      41510        proc-vmstat.nr_kernel_stack
    135507           +25.7%     170340 ±  7%  proc-vmstat.nr_mapped
    215970           +22.6%     264729 ±  3%  proc-vmstat.nr_shmem
     41429            +7.8%      44664 ±  7%  proc-vmstat.nr_slab_reclaimable
    112306            +8.7%     122083 ±  7%  proc-vmstat.nr_slab_unreclaimable
    303091           +13.2%     342957 ±  2%  proc-vmstat.nr_zone_inactive_anon
     37590 ± 28%     +51.2%      56819 ± 18%  proc-vmstat.numa_hint_faults
   5148855          +115.5%   11093970 ± 35%  proc-vmstat.numa_hit
   4915589          +120.9%   10859566 ± 36%  proc-vmstat.numa_local
    206083 ± 27%     +58.4%     326447 ± 14%  proc-vmstat.numa_pte_updates
  32486467          +143.2%   79020889 ± 39%  proc-vmstat.pgalloc_normal
    759303           +16.3%     882814 ±  3%  proc-vmstat.pgfault
  32050628          +144.9%   78486695 ± 40%  proc-vmstat.pgfree
      0.13 ±  7%    +536.1%       0.85 ± 21%  perf-stat.i.MPKI
 3.083e+10           -56.4%  1.344e+10 ±  2%  perf-stat.i.branch-instructions
      0.19 ±  3%   +6870.7        6870 ±104%  perf-stat.i.branch-miss-rate%
  42989880 ±  2%    +2e+06%  8.623e+11 ±101%  perf-stat.i.branch-misses
  16796111 ±  9%    +189.6%   48642444 ± 25%  perf-stat.i.cache-misses
  68857289 ±  5%    +196.5%  2.042e+08 ± 12%  perf-stat.i.cache-references
      7918          +929.7%      81533 ± 44%  perf-stat.i.context-switches
      3.94          +165.6%      10.46        perf-stat.i.cpi
     39043 ± 10%     -64.0%      14047 ± 19%  perf-stat.i.cycles-between-cache-misses
 1.541e+11           -62.6%   5.76e+10 ±  3%  perf-stat.i.instructions
      0.26           -61.2%       0.10        perf-stat.i.ipc
      0.10 ± 92%    +479.0%       0.56 ± 28%  perf-stat.i.major-faults
     12344           +20.7%      14898 ±  3%  perf-stat.i.minor-faults
     12345           +20.7%      14899 ±  3%  perf-stat.i.page-faults
      0.11 ±  9%    +685.5%       0.84 ± 21%  perf-stat.overall.MPKI
      0.12 ±  2%   +9674.7        9674 ±101%  perf-stat.overall.branch-miss-rate%
      4.00          +166.1%      10.63        perf-stat.overall.cpi
     37756 ± 10%     -65.1%      13184 ± 18%  perf-stat.overall.cycles-between-cache-misses
      0.25           -62.4%       0.09        perf-stat.overall.ipc
 2.952e+10           -56.5%  1.284e+10 ±  2%  perf-stat.ps.branch-instructions
  35366132 ±  2%  +3.6e+06%  1.256e+12 ±100%  perf-stat.ps.branch-misses
  15767609 ±  9%    +194.8%   46490063 ± 26%  perf-stat.ps.cache-misses
  67236264 ±  4%    +194.1%  1.977e+08 ± 12%  perf-stat.ps.cache-references
      7505          +941.8%      78193 ± 47%  perf-stat.ps.context-switches
 1.475e+11           -62.7%  5.497e+10 ±  3%  perf-stat.ps.instructions
      0.08 ± 88%    +399.3%       0.41 ± 28%  perf-stat.ps.major-faults
     10427 ±  2%     +19.6%      12474 ±  3%  perf-stat.ps.minor-faults
     10428 ±  2%     +19.6%      12475 ±  3%  perf-stat.ps.page-faults
  8.86e+12           -62.6%  3.315e+12 ±  3%  perf-stat.total.instructions
     99.55           -99.6        0.00        perf-profile.calltrace.cycles-pp.sock_diag_rcv.netlink_unicast.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg
     99.10           -99.1        0.00        perf-profile.calltrace.cycles-pp.__mutex_lock.sock_diag_rcv.netlink_unicast.netlink_sendmsg.____sys_sendmsg
     98.57           -98.6        0.00        perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.sock_diag_rcv.netlink_unicast.netlink_sendmsg
     99.57           -62.8       36.75 ±107%  perf-profile.calltrace.cycles-pp.netlink_unicast.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg
     99.58           -62.8       36.82 ±107%  perf-profile.calltrace.cycles-pp.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64
     99.58           -62.8       36.82 ±107%  perf-profile.calltrace.cycles-pp.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
     99.60           -62.8       36.84 ±107%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
     99.60           -62.8       36.84 ±107%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sendmsg
     99.60           -62.8       36.84 ±107%  perf-profile.calltrace.cycles-pp.sendmsg
     99.59           -62.8       36.83 ±107%  perf-profile.calltrace.cycles-pp.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
     99.59           -62.8       36.83 ±107%  perf-profile.calltrace.cycles-pp.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
      0.00           +36.2       36.16 ±108%  perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_diag_dump.netlink_dump.__netlink_dump_start.unix_diag_handler_dump
      0.00           +36.7       36.65 ±107%  perf-profile.calltrace.cycles-pp.unix_diag_dump.netlink_dump.__netlink_dump_start.unix_diag_handler_dump.sock_diag_rcv_msg
      0.00           +36.7       36.69 ±107%  perf-profile.calltrace.cycles-pp.netlink_dump.__netlink_dump_start.unix_diag_handler_dump.sock_diag_rcv_msg.netlink_rcv_skb
      0.00           +36.7       36.70 ±107%  perf-profile.calltrace.cycles-pp.__netlink_dump_start.unix_diag_handler_dump.sock_diag_rcv_msg.netlink_rcv_skb.netlink_unicast
      0.00           +36.7       36.70 ±107%  perf-profile.calltrace.cycles-pp.unix_diag_handler_dump.sock_diag_rcv_msg.netlink_rcv_skb.netlink_unicast.netlink_sendmsg
      0.00           +36.7       36.72 ±107%  perf-profile.calltrace.cycles-pp.sock_diag_rcv_msg.netlink_rcv_skb.netlink_unicast.netlink_sendmsg.____sys_sendmsg
      0.00           +36.7       36.72 ±107%  perf-profile.calltrace.cycles-pp.netlink_rcv_skb.netlink_unicast.netlink_sendmsg.____sys_sendmsg.___sys_sendmsg
     99.55           -99.6        0.00        perf-profile.children.cycles-pp.sock_diag_rcv
     99.10           -99.1        0.00        perf-profile.children.cycles-pp.__mutex_lock
     98.60           -98.6        0.00        perf-profile.children.cycles-pp.osq_lock
     99.57           -62.8       36.75 ±107%  perf-profile.children.cycles-pp.netlink_unicast
     99.58           -62.8       36.82 ±107%  perf-profile.children.cycles-pp.netlink_sendmsg
     99.58           -62.8       36.82 ±107%  perf-profile.children.cycles-pp.____sys_sendmsg
     99.60           -62.8       36.85 ±107%  perf-profile.children.cycles-pp.sendmsg
     99.59           -62.8       36.83 ±107%  perf-profile.children.cycles-pp.___sys_sendmsg
     99.59           -62.8       36.83 ±107%  perf-profile.children.cycles-pp.__sys_sendmsg
      0.51 ±  2%      -0.3        0.22 ± 27%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.50 ±  2%      -0.3        0.21 ± 27%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.62 ±  2%      -0.3        0.35 ± 18%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.64 ±  2%      -0.3        0.37 ± 15%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.38 ±  3%      -0.2        0.15 ± 22%  perf-profile.children.cycles-pp.tick_nohz_highres_handler
      0.39 ±  3%      -0.2        0.17 ± 16%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.36 ±  4%      -0.2        0.14 ± 21%  perf-profile.children.cycles-pp.tick_sched_handle
      0.36 ±  3%      -0.2        0.14 ± 21%  perf-profile.children.cycles-pp.update_process_times
      0.31 ±  4%      -0.2        0.12 ± 19%  perf-profile.children.cycles-pp.scheduler_tick
      0.24 ±  3%      -0.2        0.08 ± 31%  perf-profile.children.cycles-pp.task_tick_fair
      0.17 ±  6%      -0.0        0.12 ± 14%  perf-profile.children.cycles-pp.main
      0.17 ±  6%      -0.0        0.12 ± 14%  perf-profile.children.cycles-pp.run_builtin
      0.17 ±  6%      -0.0        0.13 ± 15%  perf-profile.children.cycles-pp.cmd_record
      0.17 ±  5%      -0.0        0.12 ± 14%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.16 ±  5%      -0.0        0.12 ± 12%  perf-profile.children.cycles-pp.perf_mmap__push
      0.09 ±  5%      -0.0        0.08 ± 10%  perf-profile.children.cycles-pp.writen
      0.09 ±  4%      -0.0        0.08 ± 10%  perf-profile.children.cycles-pp.write
      0.08 ±  5%      -0.0        0.07 ±  7%  perf-profile.children.cycles-pp.ksys_write
      0.07 ±  5%      -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.10            +0.0        0.13 ±  5%  perf-profile.children.cycles-pp.irq_exit_rcu
      0.09 ±  4%      +0.0        0.13 ± 26%  perf-profile.children.cycles-pp.rcu_core
      0.10 ±  3%      +0.0        0.14 ± 17%  perf-profile.children.cycles-pp.__do_softirq
      0.05            +0.1        0.12 ± 48%  perf-profile.children.cycles-pp.__sys_recvmsg
      0.06 ±  8%      +0.1        0.14 ± 47%  perf-profile.children.cycles-pp.recvmsg
      0.00            +0.1        0.09 ± 48%  perf-profile.children.cycles-pp.netlink_recvmsg
      0.00            +0.1        0.09 ± 48%  perf-profile.children.cycles-pp.sock_recvmsg
      0.02 ± 99%      +0.1        0.12 ± 47%  perf-profile.children.cycles-pp.___sys_recvmsg
      0.00            +0.1        0.10 ± 49%  perf-profile.children.cycles-pp.____sys_recvmsg
      0.07            +0.1        0.18 ± 49%  perf-profile.children.cycles-pp.sk_diag_fill
      0.00            +0.1        0.12 ± 62%  perf-profile.children.cycles-pp._raw_read_lock
      0.00            +0.2        0.18 ± 61%  perf-profile.children.cycles-pp.sock_i_ino
      0.00            +0.9        0.85 ± 60%  perf-profile.children.cycles-pp.__wake_up
      0.00            +1.1        1.07 ± 60%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.00           +23.6       23.58 ± 63%  perf-profile.children.cycles-pp.netlink_create
      0.00           +23.6       23.62 ± 63%  perf-profile.children.cycles-pp.__sock_create
      0.00           +23.7       23.65 ± 62%  perf-profile.children.cycles-pp.__sys_socket
      0.00           +23.7       23.65 ± 62%  perf-profile.children.cycles-pp.__x64_sys_socket
      0.00           +23.7       23.66 ± 62%  perf-profile.children.cycles-pp.__socket
      0.29           +35.9       36.21 ±108%  perf-profile.children.cycles-pp._raw_spin_lock
      0.42           +36.3       36.67 ±107%  perf-profile.children.cycles-pp.unix_diag_dump
      0.44           +36.3       36.70 ±107%  perf-profile.children.cycles-pp.__netlink_dump_start
      0.44           +36.3       36.70 ±107%  perf-profile.children.cycles-pp.unix_diag_handler_dump
      0.44           +36.3       36.72 ±107%  perf-profile.children.cycles-pp.sock_diag_rcv_msg
      0.44           +36.3       36.72 ±107%  perf-profile.children.cycles-pp.netlink_rcv_skb
      0.44           +36.3       36.73 ±107%  perf-profile.children.cycles-pp.netlink_dump
      0.00           +38.9       38.94 ± 63%  perf-profile.children.cycles-pp.__sock_release
      0.00           +38.9       38.94 ± 63%  perf-profile.children.cycles-pp.netlink_release
      0.00           +38.9       38.94 ± 63%  perf-profile.children.cycles-pp.sock_close
      0.00           +39.0       38.98 ± 63%  perf-profile.children.cycles-pp.__fput
      0.00           +39.0       38.99 ± 62%  perf-profile.children.cycles-pp.__x64_sys_close
      0.00           +39.0       39.01 ± 62%  perf-profile.children.cycles-pp.__close
      0.00           +94.8       94.81        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     98.02           -98.0        0.00        perf-profile.self.cycles-pp.osq_lock
      0.06 ±  6%      +0.1        0.15 ± 54%  perf-profile.self.cycles-pp.unix_diag_dump
      0.00            +0.1        0.11 ± 60%  perf-profile.self.cycles-pp._raw_read_lock
      0.00            +0.3        0.25 ± 51%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.26 ±  2%      +2.6        2.82 ± 72%  perf-profile.self.cycles-pp._raw_spin_lock
      0.00           +94.7       94.65 ±  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ