lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210420063842.GD31773@xsang-OptiPlex-9020>
Date:   Tue, 20 Apr 2021 14:38:42 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Stefan Metzmacher <metze@...ba.org>
Cc:     Jens Axboe <axboe@...nel.dk>, LKML <linux-kernel@...r.kernel.org>,
        lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
        feng.tang@...el.com, zhengjun.xing@...el.com
Subject: [io_uring]  7c30f36a98:  will-it-scale.per_thread_ops 9.1%
 improvement



Greeting,

FYI, we noticed a 9.1% improvement of will-it-scale.per_thread_ops due to commit:


commit: 7c30f36a98ae488741178d69662e4f2baa53e7f6 ("io_uring: run __io_sq_thread() with the initial creds from io_uring_setup()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: will-it-scale
on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
with following parameters:

	nr_task: 50%
	mode: thread
	test: unix1
	cpufreq_governor: performance
	ucode: 0x16

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install                job.yaml  # job file is attached in this email
        bin/lkp split-job --compatible job.yaml
        bin/lkp run                    compatible-job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/thread/50%/debian-10.4-x86_64-20200603.cgz/lkp-hsw-4ex1/unix1/will-it-scale/0x16

commit: 
  678eeba481 ("io-wq: warn on creating manager while exiting")
  7c30f36a98 ("io_uring: run __io_sq_thread() with the initial creds from io_uring_setup()")

678eeba481d8c161 7c30f36a98ae488741178d69662 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  30824092            +9.1%   33623774        will-it-scale.72.threads
    428111            +9.1%     466996        will-it-scale.per_thread_ops
  30824092            +9.1%   33623774        will-it-scale.workload
    314351 ±  4%      -8.6%     287222        numa-meminfo.node0.Unevictable
      0.04 ±116%  +27922.2%       9.90 ±123%  perf-sched.sch_delay.max.ms.__traceiter_sched_switch.__traceiter_sched_switch.schedule_timeout.rcu_gp_kthread.kthread
     15.00            +6.7%      16.00        vmstat.cpu.us
     78587 ±  4%      -8.6%      71805        numa-vmstat.node0.nr_unevictable
     78587 ±  4%      -8.6%      71805        numa-vmstat.node0.nr_zone_unevictable
      1769           -12.7%       1544        syscalls.sys_read.med
      1780            -9.4%       1613        syscalls.sys_write.med
     19842 ±  3%     -20.6%      15756 ± 12%  softirqs.CPU11.RCU
     12942 ±  8%     +17.0%      15137 ± 10%  softirqs.CPU134.RCU
     13720 ± 11%     +19.2%      16356 ± 10%  softirqs.CPU55.RCU
     36667 ±  8%     -41.0%      21647 ± 38%  softirqs.CPU83.SCHED
    266.33 ±  8%     -47.4%     140.00 ± 58%  interrupts.CPU11.RES:Rescheduling_interrupts
      1118 ± 19%     -47.1%     592.00 ± 50%  interrupts.CPU11.TLB:TLB_shootdowns
    992.50 ± 14%     -35.1%     643.67 ± 39%  interrupts.CPU120.TLB:TLB_shootdowns
      1914 ± 35%    +136.0%       4518 ± 43%  interrupts.CPU129.NMI:Non-maskable_interrupts
      1914 ± 35%    +136.0%       4518 ± 43%  interrupts.CPU129.PMI:Performance_monitoring_interrupts
     36.17 ± 71%    +206.9%     111.00 ± 44%  interrupts.CPU131.RES:Rescheduling_interrupts
      1159 ± 18%     +72.2%       1996 ± 32%  interrupts.CPU134.CAL:Function_call_interrupts
    374.83 ± 61%    +139.0%     895.67 ± 40%  interrupts.CPU134.TLB:TLB_shootdowns
      2810 ± 37%    +134.0%       6578 ± 33%  interrupts.CPU45.NMI:Non-maskable_interrupts
      2810 ± 37%    +134.0%       6578 ± 33%  interrupts.CPU45.PMI:Performance_monitoring_interrupts
      1605 ± 19%     +76.6%       2836 ± 48%  interrupts.CPU52.CAL:Function_call_interrupts
      2231 ± 27%     -37.2%       1400 ± 22%  interrupts.CPU62.CAL:Function_call_interrupts
      6880 ± 25%     -46.7%       3669 ± 57%  interrupts.CPU62.NMI:Non-maskable_interrupts
      6880 ± 25%     -46.7%       3669 ± 57%  interrupts.CPU62.PMI:Performance_monitoring_interrupts
    226.50 ± 18%     -47.5%     119.00 ± 63%  interrupts.CPU62.RES:Rescheduling_interrupts
      1169 ± 18%     -44.4%     650.83 ± 52%  interrupts.CPU62.TLB:TLB_shootdowns
    235.00 ± 13%     -59.4%      95.33 ± 65%  interrupts.CPU63.RES:Rescheduling_interrupts
    384.17 ± 64%    +120.0%     845.33 ± 30%  interrupts.CPU84.TLB:TLB_shootdowns
      1870 ±  8%     -26.7%       1370 ± 29%  interrupts.CPU93.CAL:Function_call_interrupts
      1092 ± 16%     -45.3%     597.33 ± 66%  interrupts.CPU93.TLB:TLB_shootdowns
 3.702e+10            +9.1%  4.038e+10        perf-stat.i.branch-instructions
 4.711e+08            +9.0%  5.134e+08        perf-stat.i.branch-misses
      1.13            -8.4%       1.04        perf-stat.i.cpi
 5.421e+10            +9.0%  5.909e+10        perf-stat.i.dTLB-loads
  61939091            +8.9%   67468763        perf-stat.i.dTLB-store-misses
 3.777e+10            +8.9%  4.112e+10        perf-stat.i.dTLB-stores
  64979413 ±  2%      +9.4%   71098260 ±  2%  perf-stat.i.iTLB-load-misses
 1.703e+08 ±  3%     +14.3%  1.947e+08 ± 15%  perf-stat.i.iTLB-loads
 1.857e+11            +9.1%  2.026e+11        perf-stat.i.instructions
      0.89            +9.2%       0.97        perf-stat.i.ipc
    896.93            +9.0%     978.06        perf-stat.i.metric.M/sec
     22535 ±  7%     +13.9%      25662 ±  5%  perf-stat.i.node-loads
      0.07            -9.1%       0.06 ±  2%  perf-stat.overall.MPKI
      1.13            -8.4%       1.03        perf-stat.overall.cpi
      0.89            +9.1%       0.97        perf-stat.overall.ipc
 3.687e+10            +9.1%  4.023e+10        perf-stat.ps.branch-instructions
 4.693e+08            +9.0%  5.116e+08        perf-stat.ps.branch-misses
 5.399e+10            +9.1%  5.888e+10        perf-stat.ps.dTLB-loads
  61667162            +9.0%   67198925        perf-stat.ps.dTLB-store-misses
 3.761e+10            +8.9%  4.097e+10        perf-stat.ps.dTLB-stores
  64692296 ±  2%      +9.5%   70834658 ±  2%  perf-stat.ps.iTLB-load-misses
 1.695e+08 ±  3%     +14.4%  1.939e+08 ± 15%  perf-stat.ps.iTLB-loads
 1.849e+11            +9.1%  2.018e+11        perf-stat.ps.instructions
     23463 ±  8%     +13.9%      26730 ±  8%  perf-stat.ps.node-loads
 5.594e+13            +9.1%  6.104e+13        perf-stat.total.instructions
     31.07            -2.3       28.80 ±  9%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
     37.16            -2.2       35.01 ±  9%  perf-profile.calltrace.cycles-pp.__libc_read
     20.02            -1.5       18.51 ±  9%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
     17.67 ±  2%      -1.5       16.19 ±  9%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
     20.78 ±  2%      -1.4       19.34 ±  9%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
     16.56            -1.4       15.14 ±  9%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
     15.50            -1.3       14.21 ±  9%  perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
     14.88 ±  2%      -1.3       13.62 ±  9%  perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.59            -1.2       15.38 ±  9%  perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
     13.54            -1.2       12.38 ±  9%  perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     13.18            -1.2       12.02 ±  9%  perf-profile.calltrace.cycles-pp.sock_read_iter.new_sync_read.vfs_read.ksys_read.do_syscall_64
     14.42 ±  2%      -1.1       13.28 ±  9%  perf-profile.calltrace.cycles-pp.sock_write_iter.new_sync_write.vfs_write.ksys_write.do_syscall_64
     13.50 ±  2%      -1.0       12.52 ±  9%  perf-profile.calltrace.cycles-pp.sock_sendmsg.sock_write_iter.new_sync_write.vfs_write.ksys_write
     12.25 ±  2%      -0.9       11.36 ±  9%  perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write.vfs_write
      3.14 ±  2%      -0.8        2.31 ±  9%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.__libc_read
     10.57            -0.8        9.81 ±  9%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.__libc_read
      1.70 ±  2%      -0.5        1.17 ±  9%  perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.new_sync_read.vfs_read.ksys_read
      2.15 ±  2%      -0.4        1.72 ±  9%  perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write
      1.05 ±  2%      -0.2        0.81 ±  9%  perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
      0.60            -0.2        0.44 ± 44%  perf-profile.calltrace.cycles-pp.unix_write_space.sock_wfree.unix_destruct_scm.skb_release_head_state.skb_release_all
      1.52 ±  2%      -0.2        1.36 ±  8%  perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic
      1.61            -0.2        1.46 ±  8%  perf-profile.calltrace.cycles-pp.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
      1.64            -0.2        1.49 ±  9%  perf-profile.calltrace.cycles-pp.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
     40.84            -2.9       37.89 ±  9%  perf-profile.children.cycles-pp.do_syscall_64
     37.53            -2.1       35.38 ±  9%  perf-profile.children.cycles-pp.__libc_read
     17.70 ±  2%      -1.5       16.22 ±  9%  perf-profile.children.cycles-pp.ksys_write
     16.60            -1.4       15.19 ±  9%  perf-profile.children.cycles-pp.vfs_write
     15.55            -1.3       14.26 ±  9%  perf-profile.children.cycles-pp.vfs_read
     14.91 ±  2%      -1.3       13.65 ±  9%  perf-profile.children.cycles-pp.new_sync_write
     16.62            -1.2       15.41 ±  9%  perf-profile.children.cycles-pp.ksys_read
     13.22            -1.2       12.06 ±  9%  perf-profile.children.cycles-pp.sock_read_iter
     13.58            -1.2       12.42 ±  9%  perf-profile.children.cycles-pp.new_sync_read
     14.47 ±  2%      -1.1       13.33 ±  9%  perf-profile.children.cycles-pp.sock_write_iter
     13.52 ±  2%      -1.0       12.55 ±  9%  perf-profile.children.cycles-pp.sock_sendmsg
     12.36 ±  2%      -0.9       11.42 ±  9%  perf-profile.children.cycles-pp.unix_stream_sendmsg
      5.51 ±  2%      -0.8        4.72 ±  8%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      1.71 ±  2%      -0.5        1.20 ±  9%  perf-profile.children.cycles-pp.sock_recvmsg
      2.18 ±  2%      -0.4        1.74 ±  9%  perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
      1.22 ±  2%      -0.3        0.88 ±  8%  perf-profile.children.cycles-pp.__x86_retpoline_rax
      0.52            -0.2        0.31 ±  9%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.93            -0.2        0.72 ±  9%  perf-profile.children.cycles-pp.fsnotify
      2.25 ±  2%      -0.2        2.06 ±  9%  perf-profile.children.cycles-pp.__check_object_size
      1.55 ±  2%      -0.2        1.40 ±  8%  perf-profile.children.cycles-pp.unix_destruct_scm
      1.64            -0.2        1.49 ±  9%  perf-profile.children.cycles-pp.skb_release_all
      1.62            -0.2        1.47 ±  8%  perf-profile.children.cycles-pp.skb_release_head_state
      0.58            -0.1        0.44 ±  8%  perf-profile.children.cycles-pp.__virt_addr_valid
      0.47 ±  4%      -0.1        0.35 ±  9%  perf-profile.children.cycles-pp.wait_for_unix_gc
      0.61 ±  2%      -0.1        0.51 ±  8%  perf-profile.children.cycles-pp.unix_write_space
      0.40            -0.1        0.31 ± 10%  perf-profile.children.cycles-pp.__x64_sys_read
      0.63            -0.1        0.55 ±  9%  perf-profile.children.cycles-pp.__might_sleep
      0.35 ±  3%      -0.1        0.29 ±  8%  perf-profile.children.cycles-pp.apparmor_socket_getpeersec_dgram
      0.13            -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.check_stack_object
      0.13 ±  5%      -0.0        0.09 ± 10%  perf-profile.children.cycles-pp.unix_scm_to_skb
      0.45 ±  3%      -0.0        0.42 ±  8%  perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
      0.09 ±  5%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
      0.57 ±  2%      +0.1        0.71 ± 12%  perf-profile.children.cycles-pp.__ksize
      1.12 ±  2%      -0.8        0.37 ±  9%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      0.50 ±  2%      -0.4        0.09 ± 12%  perf-profile.self.cycles-pp.sock_recvmsg
      1.13            -0.4        0.73 ±  9%  perf-profile.self.cycles-pp.sock_read_iter
      0.98 ±  3%      -0.3        0.68 ±  9%  perf-profile.self.cycles-pp.__x86_retpoline_rax
      0.36            -0.2        0.13 ±  8%  perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
      0.89            -0.2        0.69 ±  9%  perf-profile.self.cycles-pp.fsnotify
      0.29 ±  3%      -0.2        0.10 ±  9%  perf-profile.self.cycles-pp.security_socket_recvmsg
      0.93 ±  2%      -0.2        0.74 ±  9%  perf-profile.self.cycles-pp.sock_write_iter
      0.92 ±  4%      -0.2        0.76 ± 10%  perf-profile.self.cycles-pp.ftrace_syscall_exit
      0.40 ±  2%      -0.2        0.25 ± 10%  perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      1.19 ±  2%      -0.1        1.04 ±  9%  perf-profile.self.cycles-pp.unix_stream_sendmsg
      0.56 ±  2%      -0.1        0.42 ±  9%  perf-profile.self.cycles-pp.__virt_addr_valid
      0.46 ±  6%      -0.1        0.34 ±  9%  perf-profile.self.cycles-pp.syscall_trace_enter
      0.37 ±  2%      -0.1        0.26 ±  8%  perf-profile.self.cycles-pp.new_sync_write
      0.25 ±  3%      -0.1        0.14 ± 11%  perf-profile.self.cycles-pp.alloc_skb_with_frags
      0.34            -0.1        0.24 ± 10%  perf-profile.self.cycles-pp.__x64_sys_read
      0.59            -0.1        0.50 ±  8%  perf-profile.self.cycles-pp.unix_write_space
      0.40 ±  2%      -0.1        0.31 ±  9%  perf-profile.self.cycles-pp.unix_destruct_scm
      0.55 ±  2%      -0.1        0.47 ±  9%  perf-profile.self.cycles-pp.__alloc_skb
      0.28 ±  2%      -0.1        0.20 ±  9%  perf-profile.self.cycles-pp.ksys_write
      0.48 ±  2%      -0.1        0.41 ± 10%  perf-profile.self.cycles-pp.vfs_read
      0.16 ±  6%      -0.1        0.09 ± 13%  perf-profile.self.cycles-pp.skb_copy_datagram_iter
      0.49 ±  2%      -0.1        0.43 ±  9%  perf-profile.self.cycles-pp.unix_stream_recvmsg
      0.13 ±  5%      -0.1        0.08 ± 11%  perf-profile.self.cycles-pp.wait_for_unix_gc
      0.13 ±  5%      -0.1        0.07 ± 10%  perf-profile.self.cycles-pp.unix_scm_to_skb
      0.21 ±  5%      -0.1        0.16 ± 11%  perf-profile.self.cycles-pp.sock_alloc_send_pskb
      0.48            -0.1        0.43 ± 10%  perf-profile.self.cycles-pp.vfs_write
      0.08 ±  7%      -0.0        0.03 ± 70%  perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
      0.11 ±  4%      -0.0        0.06 ± 11%  perf-profile.self.cycles-pp.check_stack_object
      0.30 ±  3%      -0.0        0.26 ±  8%  perf-profile.self.cycles-pp.apparmor_socket_getpeersec_dgram
      0.14 ±  4%      -0.0        0.10 ±  9%  perf-profile.self.cycles-pp.sock_sendmsg
      0.22 ±  4%      -0.0        0.18 ±  9%  perf-profile.self.cycles-pp.do_syscall_64
      0.21 ±  2%      -0.0        0.18 ± 12%  perf-profile.self.cycles-pp.__skb_datagram_iter
      0.27 ±  2%      -0.0        0.23 ±  9%  perf-profile.self.cycles-pp.__x64_sys_write
      0.23 ±  4%      -0.0        0.19 ±  8%  perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
      0.18 ±  3%      +0.1        0.24 ±  8%  perf-profile.self.cycles-pp.security_file_permission
      0.23 ±  2%      +0.1        0.31 ±  8%  perf-profile.self.cycles-pp.ksys_read
      0.22 ±  3%      +0.1        0.30 ±  9%  perf-profile.self.cycles-pp.apparmor_socket_recvmsg
      0.55            +0.1        0.70 ± 12%  perf-profile.self.cycles-pp.__ksize


                                                                                
                            will-it-scale.per_thread_ops                        
                                                                                
  480000 +------------------------------------------------------------------+   
  475000 |-+                               OO O O O O                       |   
         |   O           O O O O O O   O O                                  |   
  470000 |-O   O O                   O                            O O O O O |   
  465000 |-+       O O O                              O O O O O O           |   
  460000 |-+  .+.+.+.+.+.     .+.+.     .+.+                                |   
  455000 |.+.+           +.+.+     +.+.+    +.                              |   
         |                                    +.+                           |   
  450000 |-+                                    :                           |   
  445000 |-+                                     :                          |   
  440000 |-+                                     :                          |   
  435000 |-+                                     :                          |   
         |                                        :                         |   
  430000 |-+                                      +.+.+.+.+.+               |   
  425000 +------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.12.0-rc2-00011-g7c30f36a98ae" of type "text/plain" (172883 bytes)

View attachment "job-script" of type "text/plain" (7797 bytes)

View attachment "job.yaml" of type "text/plain" (5335 bytes)

View attachment "reproduce" of type "text/plain" (336 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ