lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20220117143745.GA6098@xsang-OptiPlex-9020>
Date:   Mon, 17 Jan 2022 22:37:45 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Kuniyuki Iwashima <kuniyu@...zon.co.jp>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...ux.intel.com, fengwei.yin@...el.com
Subject: [af_unix]  afd20b9290:  stress-ng.sockdiag.ops_per_sec -26.3%
 regression


(this commit was previously reported as
"[af_unix]  afd20b9290:  stress-ng.sockdiag.ops_per_sec -26.3% regression"
when it's still on linux-next/master
https://lore.kernel.org/all/20211219083847.GA14057@xsang-OptiPlex-9020/
report again as a reminder the regression still exists on mainline)

Greeting,

FYI, we noticed a -26.3% regression of stress-ng.sockdiag.ops_per_sec due to commit:


commit: afd20b9290e184c203fe22f2d6b80dc7127ba724 ("af_unix: Replace the big lock with small locks.")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory
with following parameters:

	nr_threads: 100%
	testtime: 60s
	class: network
	test: sockdiag
	cpufreq_governor: performance
	ucode: 0xd000280




If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
  network/gcc-9/performance/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-icl-2sp6/sockdiag/stress-ng/60s/0xd000280

commit: 
  e6b4b87389 ("af_unix: Save hash in sk_hash.")
  afd20b9290 ("af_unix: Replace the big lock with small locks.")

e6b4b873896f0e92 afd20b9290e184c203fe22f2d6b 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 3.129e+08           -26.3%  2.306e+08        stress-ng.sockdiag.ops
   5214640           -26.3%    3842782        stress-ng.sockdiag.ops_per_sec
     82895            -6.9%      77178        stress-ng.time.involuntary_context_switches
    103737            -9.5%      93892        stress-ng.time.voluntary_context_switches
      7067            -6.3%       6620        vmstat.system.cs
      0.05            -0.0        0.04 ±  6%  mpstat.cpu.all.soft%
      0.13 ±  3%      -0.0        0.12 ±  5%  mpstat.cpu.all.usr%
   1783836 ±  7%     -21.6%    1397649 ± 12%  numa-vmstat.node1.numa_hit
   1689477 ±  8%     -22.9%    1303128 ± 13%  numa-vmstat.node1.numa_local
    894897 ± 22%     +46.6%    1312222 ± 11%  turbostat.C1E
      3.85 ± 55%      +3.5        7.33 ± 10%  turbostat.C1E%
   2451882 ±  4%     -24.3%    1855676 ±  2%  numa-numastat.node0.local_node
   2501404 ±  3%     -23.8%    1905161 ±  3%  numa-numastat.node0.numa_hit
   2437526           -24.1%    1849165 ±  3%  numa-numastat.node1.local_node
   2503693           -23.5%    1915338 ±  3%  numa-numastat.node1.numa_hit
      7977 ± 19%     -22.6%       6178 ±  8%  softirqs.CPU2.RCU
      7989 ± 25%     -23.4%       6121 ±  3%  softirqs.CPU25.RCU
      8011 ± 24%     -26.8%       5862 ±  3%  softirqs.CPU8.RCU
    890963 ±  3%     -17.4%     735738        softirqs.RCU
     74920            -3.6%      72233        proc-vmstat.nr_slab_unreclaimable
   5007343           -23.7%    3821593        proc-vmstat.numa_hit
   4891675           -24.2%    3705934        proc-vmstat.numa_local
   5007443           -23.7%    3821701        proc-vmstat.pgalloc_normal
   4796850           -24.7%    3610677        proc-vmstat.pgfree
      0.71 ± 17%     -41.1%       0.42        perf-stat.i.MPKI
      0.12 ± 12%      -0.0        0.10 ±  8%  perf-stat.i.branch-miss-rate%
  10044516 ± 13%     -23.6%    7678759 ±  3%  perf-stat.i.cache-misses
  42758000 ±  6%     -28.5%   30580693        perf-stat.i.cache-references
      6920            -5.9%       6510        perf-stat.i.context-switches
    571.08 ±  2%     -13.4%     494.31 ±  2%  perf-stat.i.cpu-migrations
     39356 ± 12%     +29.2%      50865 ±  3%  perf-stat.i.cycles-between-cache-misses
      0.01 ± 36%      -0.0        0.00 ± 24%  perf-stat.i.dTLB-load-miss-rate%
      0.01 ± 23%      -0.0        0.00 ± 14%  perf-stat.i.dTLB-store-miss-rate%
 8.447e+08           +27.0%  1.073e+09        perf-stat.i.dTLB-stores
     13.36            -2.2%      13.07        perf-stat.i.major-faults
    364.56 ±  9%     -24.9%     273.60        perf-stat.i.metric.K/sec
    350.63            +0.7%     353.23        perf-stat.i.metric.M/sec
     87.88            +1.4       89.23        perf-stat.i.node-load-miss-rate%
   1381985 ± 12%     -27.7%     999393 ±  3%  perf-stat.i.node-load-misses
    198989 ±  6%     -31.9%     135458 ±  4%  perf-stat.i.node-loads
   4305132           -27.4%    3124590        perf-stat.i.node-store-misses
    581796 ±  5%     -25.6%     432807 ±  3%  perf-stat.i.node-stores
      0.46 ±  5%     -28.7%       0.33        perf-stat.overall.MPKI
     39894 ± 12%     +28.6%      51310 ±  3%  perf-stat.overall.cycles-between-cache-misses
      0.01 ± 22%      -0.0        0.00 ± 12%  perf-stat.overall.dTLB-store-miss-rate%
   9916145 ± 13%     -23.8%    7560589 ±  3%  perf-stat.ps.cache-misses
  42385546 ±  5%     -28.7%   30225277        perf-stat.ps.cache-references
      6786            -5.9%       6385        perf-stat.ps.context-switches
    562.65 ±  2%     -13.5%     486.73 ±  2%  perf-stat.ps.cpu-migrations
 8.314e+08           +26.8%  1.055e+09        perf-stat.ps.dTLB-stores
   1359293 ± 11%     -27.7%     982331 ±  3%  perf-stat.ps.node-load-misses
    205280 ±  6%     -33.3%     136979 ±  5%  perf-stat.ps.node-loads
   4237942           -27.5%    3070934        perf-stat.ps.node-store-misses
    585102 ±  5%     -26.6%     429702 ±  3%  perf-stat.ps.node-stores
 5.844e+12            +0.9%  5.897e+12        perf-stat.total.instructions
     99.26            +0.5       99.72        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sendmsg
     99.25            +0.5       99.72        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
     99.25            +0.5       99.72        perf-profile.calltrace.cycles-pp.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
     99.26            +0.5       99.73        perf-profile.calltrace.cycles-pp.sendmsg
     99.24            +0.5       99.71        perf-profile.calltrace.cycles-pp.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe
     99.24            +0.5       99.71        perf-profile.calltrace.cycles-pp.sock_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg.do_syscall_64
     99.25            +0.5       99.72        perf-profile.calltrace.cycles-pp.___sys_sendmsg.__sys_sendmsg.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendmsg
     99.24            +0.5       99.71        perf-profile.calltrace.cycles-pp.netlink_sendmsg.sock_sendmsg.____sys_sendmsg.___sys_sendmsg.__sys_sendmsg
     97.56            +0.5       98.04        perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.sock_diag_rcv.netlink_unicast.netlink_sendmsg
     99.22            +0.5       99.70        perf-profile.calltrace.cycles-pp.netlink_unicast.netlink_sendmsg.sock_sendmsg.____sys_sendmsg.___sys_sendmsg
     99.19            +0.5       99.68        perf-profile.calltrace.cycles-pp.sock_diag_rcv.netlink_unicast.netlink_sendmsg.sock_sendmsg.____sys_sendmsg
     98.41            +0.5       98.90        perf-profile.calltrace.cycles-pp.__mutex_lock.sock_diag_rcv.netlink_unicast.netlink_sendmsg.sock_sendmsg
      0.48            -0.4        0.07 ±  5%  perf-profile.children.cycles-pp.recvmsg
      0.46 ±  2%      -0.4        0.06        perf-profile.children.cycles-pp.___sys_recvmsg
      0.47 ±  2%      -0.4        0.07 ±  6%  perf-profile.children.cycles-pp.__sys_recvmsg
      0.45            -0.4        0.06 ±  9%  perf-profile.children.cycles-pp.____sys_recvmsg
      1.14            -0.4        0.76        perf-profile.children.cycles-pp.netlink_dump
      1.09            -0.4        0.73        perf-profile.children.cycles-pp.unix_diag_dump
      0.66            -0.3        0.37 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock
      0.26 ±  2%      -0.1        0.19 ±  2%  perf-profile.children.cycles-pp.sk_diag_fill
      0.07 ±  5%      -0.0        0.04 ± 57%  perf-profile.children.cycles-pp.__x64_sys_socket
      0.07 ±  5%      -0.0        0.04 ± 57%  perf-profile.children.cycles-pp.__sys_socket
      0.07            -0.0        0.04 ± 57%  perf-profile.children.cycles-pp.__close
      0.12 ±  4%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.memset_erms
      0.11 ±  4%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.nla_put
      0.08 ±  5%      -0.0        0.06        perf-profile.children.cycles-pp.__nlmsg_put
      0.08 ±  5%      -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.__socket
      0.08            -0.0        0.06 ±  7%  perf-profile.children.cycles-pp.__nla_put
      0.07            -0.0        0.05        perf-profile.children.cycles-pp.__nla_reserve
      0.07 ±  5%      -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.rcu_core
      0.08 ±  5%      -0.0        0.06        perf-profile.children.cycles-pp.__softirqentry_text_start
      0.07            -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.rcu_do_batch
      0.06 ±  7%      -0.0        0.05        perf-profile.children.cycles-pp.sock_i_ino
     99.89            +0.0       99.92        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     99.89            +0.0       99.92        perf-profile.children.cycles-pp.do_syscall_64
      0.00            +0.1        0.08        perf-profile.children.cycles-pp.__raw_callee_save___native_queued_spin_unlock
     99.26            +0.5       99.73        perf-profile.children.cycles-pp.sendmsg
     99.25            +0.5       99.72        perf-profile.children.cycles-pp.__sys_sendmsg
     99.25            +0.5       99.72        perf-profile.children.cycles-pp.___sys_sendmsg
     99.24            +0.5       99.71        perf-profile.children.cycles-pp.____sys_sendmsg
     99.24            +0.5       99.71        perf-profile.children.cycles-pp.sock_sendmsg
     99.24            +0.5       99.71        perf-profile.children.cycles-pp.netlink_sendmsg
     99.22            +0.5       99.70        perf-profile.children.cycles-pp.netlink_unicast
     97.59            +0.5       98.08        perf-profile.children.cycles-pp.osq_lock
     99.19            +0.5       99.68        perf-profile.children.cycles-pp.sock_diag_rcv
     98.41            +0.5       98.90        perf-profile.children.cycles-pp.__mutex_lock
      0.12 ±  5%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.unix_diag_dump
      0.11            -0.0        0.08        perf-profile.self.cycles-pp.memset_erms
      0.00            +0.1        0.06        perf-profile.self.cycles-pp.__raw_callee_save___native_queued_spin_unlock
      0.28 ±  5%      +0.1        0.35 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock
     97.23            +0.5       97.72        perf-profile.self.cycles-pp.osq_lock




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.16.0-rc2-00840-gafd20b9290e1" of type "text/plain" (173572 bytes)

View attachment "job-script" of type "text/plain" (7906 bytes)

View attachment "job.yaml" of type "text/plain" (5389 bytes)

View attachment "reproduce" of type "text/plain" (342 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ