lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20171206021450.GE21779@yexl-desktop>
Date:   Wed, 6 Dec 2017 10:14:50 +0800
From:   kernel test robot <xiaolong.ye@...el.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Andy Lutomirski <luto@...capital.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [lkp-robot] [locking/x86]  450cbdd012: will-it-scale.per_process_ops
 16.1% improvement


Greeting,

FYI, we noticed a 16.1% improvement of will-it-scale.per_process_ops due to commit:


commit: 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730 ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
with following parameters:

	test: futex3
	cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
  gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/futex3/will-it-scale

commit: 
  b04db8e19f ("rcu: Use lockdep to assert IRQs are disabled/enabled")
  450cbdd012 ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

b04db8e19fc2e913 450cbdd0125cfa5d7bbf9e2a6b 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   9560719           +16.1%   11095797        will-it-scale.per_process_ops
   9562708           +17.1%   11197397        will-it-scale.per_thread_ops
      0.64            -7.3%       0.59        will-it-scale.scalability
      2019            -3.3%       1953        will-it-scale.time.system_time
    490.82           +13.5%     556.90        will-it-scale.time.user_time
      8.16            +1.1        9.26        mpstat.cpu.usr%
      1834 ± 12%     -15.1%       1558 ±  2%  vmstat.system.cs
     30.11 ±  3%      -4.0%      28.91        boot-time.boot
    883.63 ±  4%      -5.2%     838.05 ±  2%  boot-time.idle
      1752           -60.3%     696.00 ± 87%  meminfo.Mlocked
      1752           -60.3%     696.00 ± 87%  meminfo.Unevictable
    991.50           -60.5%     391.25 ± 86%  numa-meminfo.node1.Mlocked
    991.50           -60.5%     391.25 ± 86%  numa-meminfo.node1.Unevictable
    104.91            +2.7%     107.75        turbostat.CorWatt
    132.50            +2.2%     135.36        turbostat.PkgWatt
    247.25           -60.5%      97.75 ± 86%  numa-vmstat.node1.nr_mlock
    247.25           -60.5%      97.75 ± 86%  numa-vmstat.node1.nr_unevictable
    247.25           -60.5%      97.75 ± 86%  numa-vmstat.node1.nr_zone_unevictable
      5116 ±  2%     +10.7%       5664 ±  3%  slabinfo.cred_jar.active_objs
      5116 ±  2%     +10.7%       5664 ±  3%  slabinfo.cred_jar.num_objs
      1827 ±  7%     +19.3%       2180 ±  3%  slabinfo.fsnotify_mark_connector.active_objs
      1827 ±  7%     +19.3%       2180 ±  3%  slabinfo.fsnotify_mark_connector.num_objs
      6521 ±  3%     +13.5%       7399        slabinfo.kmalloc-96.active_objs
      6584 ±  3%     +12.8%       7424        slabinfo.kmalloc-96.num_objs
  1.99e+12           +14.6%  2.281e+12        perf-stat.branch-instructions
      0.01 ±  2%      -0.0        0.01 ±  3%  perf-stat.branch-miss-rate%
  2.43e+08            +2.3%  2.486e+08        perf-stat.cache-misses
    690932 ± 12%     -15.2%     586079 ±  2%  perf-stat.context-switches
      1.03           -12.7%       0.90        perf-stat.cpi
  1.63e+13            -1.1%  1.611e+13        perf-stat.cpu-cycles
 3.459e+12           +13.5%  3.925e+12        perf-stat.dTLB-loads
      0.00 ±  7%      -0.0        0.00 ±  5%  perf-stat.dTLB-store-miss-rate%
 2.881e+12 ±  3%     +16.3%  3.351e+12        perf-stat.dTLB-stores
 2.956e+09 ± 46%    +146.2%  7.279e+09 ± 27%  perf-stat.iTLB-load-misses
 1.584e+13           +13.2%  1.794e+13        perf-stat.instructions
      8495 ± 83%     -68.8%       2651 ± 26%  perf-stat.instructions-per-iTLB-miss
      0.97           +14.5%       1.11        perf-stat.ipc
     32.91            -0.9       31.99        perf-stat.node-load-miss-rate%
     31.09            -3.2       27.86        perf-stat.node-store-miss-rate%
  84812551 ±  2%     -13.9%   72994159 ±  2%  perf-stat.node-store-misses
     53.36            -8.0       45.33        perf-profile.calltrace.cycles.do_futex.sys_futex.entry_SYSCALL_64_fastpath
     57.06            -7.0       50.02        perf-profile.calltrace.cycles.sys_futex.entry_SYSCALL_64_fastpath
     62.70            -5.7       56.96        perf-profile.calltrace.cycles.entry_SYSCALL_64_fastpath
     17.33 ±  3%      -4.9       12.46        perf-profile.calltrace.cycles.hash_futex.do_futex.sys_futex.entry_SYSCALL_64_fastpath
     30.58            -4.5       26.11 ±  2%  perf-profile.calltrace.cycles.futex_wake.do_futex.sys_futex.entry_SYSCALL_64_fastpath
     19.36            -2.5       16.89 ±  3%  perf-profile.calltrace.cycles.get_futex_key.futex_wake.do_futex.sys_futex.entry_SYSCALL_64_fastpath
     13.13 ±  2%      -1.9       11.19 ±  3%  perf-profile.calltrace.cycles.get_futex_key_refs.get_futex_key.futex_wake.do_futex.sys_futex
     10.09 ± 13%      -0.4        9.71 ± 14%  perf-profile.calltrace.cycles.poll_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle
     12.38 ±  9%      -0.4       12.02 ± 11%  perf-profile.calltrace.cycles.cpu_startup_entry.start_secondary.verify_cpu
     12.38 ±  9%      -0.4       12.02 ± 11%  perf-profile.calltrace.cycles.start_secondary.verify_cpu
     12.36 ±  9%      -0.4       12.00 ± 11%  perf-profile.calltrace.cycles.do_idle.cpu_startup_entry.start_secondary.verify_cpu
     12.67 ±  9%      -0.4       12.31 ± 11%  perf-profile.calltrace.cycles.verify_cpu
     12.11 ±  9%      -0.3       11.77 ± 11%  perf-profile.calltrace.cycles.cpuidle_enter_state.cpuidle_enter.call_cpuidle.do_idle.cpu_startup_entry
     12.20 ±  9%      -0.3       11.86 ± 11%  perf-profile.calltrace.cycles.cpuidle_enter.call_cpuidle.do_idle.cpu_startup_entry.start_secondary
     12.20 ±  9%      -0.3       11.87 ± 11%  perf-profile.calltrace.cycles.call_cpuidle.do_idle.cpu_startup_entry.start_secondary.verify_cpu
     20.03 ±  3%      +5.2       25.19 ±  2%  perf-profile.calltrace.cycles.entry_SYSCALL_64
     54.06            -7.8       46.21        perf-profile.children.cycles.do_futex
     57.42            -6.9       50.49        perf-profile.children.cycles.sys_futex
     62.78            -5.7       57.07        perf-profile.children.cycles.entry_SYSCALL_64_fastpath
     17.30            -5.3       11.96 ±  4%  perf-profile.children.cycles.get_futex_key_refs
     18.02 ±  3%      -4.7       13.32        perf-profile.children.cycles.hash_futex
     30.97            -4.4       26.59 ±  2%  perf-profile.children.cycles.futex_wake
     19.74            -2.4       17.36 ±  3%  perf-profile.children.cycles.get_futex_key
     10.19 ± 14%      -0.4        9.79 ± 15%  perf-profile.children.cycles.poll_idle
     12.65 ±  9%      -0.4       12.29 ± 11%  perf-profile.children.cycles.do_idle
     12.38 ±  9%      -0.4       12.02 ± 11%  perf-profile.children.cycles.start_secondary
     12.67 ±  9%      -0.4       12.31 ± 11%  perf-profile.children.cycles.verify_cpu
     12.67 ±  9%      -0.4       12.31 ± 11%  perf-profile.children.cycles.cpu_startup_entry
     12.39 ± 10%      -0.3       12.04 ± 11%  perf-profile.children.cycles.cpuidle_enter_state
     12.49 ± 10%      -0.3       12.15 ± 11%  perf-profile.children.cycles.call_cpuidle
     12.48 ± 10%      -0.3       12.14 ± 11%  perf-profile.children.cycles.cpuidle_enter
     20.03 ±  3%      +5.2       25.19 ±  2%  perf-profile.children.cycles.entry_SYSCALL_64
     17.13            -5.3       11.82 ±  3%  perf-profile.self.cycles.get_futex_key_refs
     17.94 ±  3%      -4.8       13.14        perf-profile.self.cycles.hash_futex
     10.16 ± 14%      -0.4        9.72 ± 14%  perf-profile.self.cycles.poll_idle
      6.51            -0.4        6.16        perf-profile.self.cycles.get_futex_key
      4.38 ±  2%      +0.9        5.33 ±  2%  perf-profile.self.cycles.entry_SYSCALL_64_fastpath
      4.69 ±  2%      +1.1        5.74 ±  2%  perf-profile.self.cycles.do_futex
      6.35 ±  2%      +1.2        7.56 ±  2%  perf-profile.self.cycles.futex_wake
     20.03 ±  3%      +5.2       25.19 ±  2%  perf-profile.self.cycles.entry_SYSCALL_64


                                                                                
                             will-it-scale.per_process_ops                      
                                                                                
  1.2e+07 +-+---------------------------------------------------------------+   
          O O O  O O O   O  O O O O O  O O O O O  O                         |   
    1e+07 +-+                                                               |   
          |.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.|   
          |                                                                 |   
    8e+06 +-+                                                               |   
          |                                                                 |   
    6e+06 +-+                                                               |   
          |                                                                 |   
    4e+06 +-+                                                               |   
          |                                                                 |   
          |                                                                 |   
    2e+06 +-+                                                               |   
          |                                                                 |   
        0 +-+----------O----------------------------------------------------+   
                                                                                
                                                                                                                                                                
                             will-it-scale.per_thread_ops                       
                                                                                
  1.2e+07 +-+---------------------------------------------------------------+   
          O O O  O O O   O  O O O O O  O O O O O  O                         |   
    1e+07 +-+                                                               |   
          |.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.|   
          |                                                                 |   
    8e+06 +-+                                                               |   
          |                                                                 |   
    6e+06 +-+                                                               |   
          |                                                                 |   
    4e+06 +-+                                                               |   
          |                                                                 |   
          |                                                                 |   
    2e+06 +-+                                                               |   
          |                                                                 |   
        0 +-+----------O----------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong

View attachment "config-4.14.0-rc8-00080-g450cbdd" of type "text/plain" (163362 bytes)

View attachment "job-script" of type "text/plain" (7133 bytes)

View attachment "job.yaml" of type "text/plain" (4757 bytes)

View attachment "reproduce" of type "text/plain" (328 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ