lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1424927813.10337.19.camel@intel.com>
Date:	Thu, 26 Feb 2015 13:16:53 +0800
From:	Huang Ying <ying.huang@...el.com>
To:	Catalin Marinas <catalin.marinas@....com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, LKP ML <lkp@...org>
Subject: [LKP] [futex] 76835b0ebf8: -8.1% will-it-scale.per_thread_ops

FYI, we noticed the below changes on

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
commit 76835b0ebf8a7fe85beb03c75121419a7dec52f0 ("futex: Ensure get_futex_key_refs() always implies a barrier")


testbox/testcase/testparams: lkp-wsx01/will-it-scale/performance-futex4

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
         %stddev     %change         %stddev
             \          |                \  
   6314259 ±  0%      -8.1%    5800079 ±  0%  will-it-scale.per_thread_ops
   6274871 ±  0%      -8.1%    5768747 ±  0%  will-it-scale.per_process_ops
      0.64 ±  0%      +4.6%       0.67 ±  1%  will-it-scale.scalability
      0.79 ±  2%    +716.1%       6.48 ±  1%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wait_setup.futex_wait.do_futex.sys_futex
         2 ± 44%    +200.0%          6 ± 21%  sched_debug.cpu#79.nr_uninterruptible
      1320 ± 49%     -64.8%        464 ± 15%  sched_debug.cpu#61.ttwu_count
       167 ± 21%     -45.9%         90 ± 49%  sched_debug.cfs_rq[61]:/.blocked_load_avg
         7 ± 18%     +48.6%         10 ± 28%  sched_debug.cfs_rq[25]:/.load
         7 ± 18%     +60.0%         11 ± 34%  sched_debug.cpu#25.load
       175 ± 20%     -44.3%         97 ± 47%  sched_debug.cfs_rq[61]:/.tg_load_contrib
      2406 ± 49%     -58.3%       1003 ± 25%  sched_debug.cpu#61.nr_switches
      2417 ± 49%     -58.1%       1014 ± 25%  sched_debug.cpu#61.sched_count
       613 ± 19%     -34.6%        401 ± 25%  sched_debug.cpu#61.sched_goidle
      4.56 ±  1%     +37.4%       6.26 ±  2%  perf-profile.cpu-cycles.get_futex_key.futex_wait_setup.futex_wait.do_futex.sys_futex
     85583 ±  9%     -14.8%      72913 ±  7%  sched_debug.cpu#0.nr_load_updates
     29.29 ±  0%     +19.2%      34.90 ±  2%  perf-profile.cpu-cycles.futex_wait_setup.futex_wait.do_futex.sys_futex.system_call_fastpath
      1.05 ±  3%     -10.6%       0.94 ±  1%  perf-profile.cpu-cycles.testcase
      2.43 ±  2%     -10.4%       2.18 ±  0%  perf-profile.cpu-cycles.sysret_check.syscall
     84405 ±  7%     -11.0%      75139 ±  7%  sched_debug.cfs_rq[0]:/.exec_clock
      1.07 ±  2%     -14.7%       0.91 ±  2%  perf-profile.cpu-cycles._raw_spin_unlock.futex_wait_setup.futex_wait.do_futex.sys_futex
      5.91 ±  0%     -10.2%       5.31 ±  2%  perf-profile.cpu-cycles._raw_spin_lock.futex_wait_setup.futex_wait.do_futex.sys_futex
     66640 ±  5%      +5.7%      70433 ±  6%  sched_debug.cpu#10.nr_load_updates
      4274 ±  3%     -12.0%       3762 ±  7%  sched_debug.cpu#21.curr->pid

testbox/testcase/testparams: wsm/will-it-scale/performance-futex3

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
  11676004 ±  0%     -10.3%   10473333 ±  0%  will-it-scale.per_thread_ops
  11515138 ±  0%      -8.8%   10501984 ±  0%  will-it-scale.per_process_ops
      0.69 ±  3%      +8.2%       0.75 ±  1%  will-it-scale.scalability
      1.76 ±  4%    +364.0%       8.18 ±  0%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wake.do_futex.sys_futex.system_call_fastpath
  76838319 ± 12%     +24.4%   95586476 ±  5%  cpuidle.POLL.time
    163113 ± 44%     +89.7%     309491 ± 14%  sched_debug.cfs_rq[6]:/.spread0
     16.31 ±  1%     +40.2%      22.86 ±  0%  perf-profile.cpu-cycles.futex_wake.do_futex.sys_futex.system_call_fastpath.syscall
        89 ± 17%     -26.8%         65 ± 24%  sched_debug.cfs_rq[2]:/.load
        88 ± 19%     -24.1%         66 ± 23%  sched_debug.cpu#2.load
       100 ± 11%     +20.3%        121 ± 13%  sched_debug.cpu#6.load
        87 ± 10%     -24.6%         66 ± 10%  sched_debug.cfs_rq[1]:/.load
       787 ± 13%     -22.1%        613 ±  9%  sched_debug.cfs_rq[4]:/.blocked_load_avg
      7.05 ±  0%     +12.0%       7.89 ±  1%  perf-profile.cpu-cycles.get_futex_key.futex_wake.do_futex.sys_futex.system_call_fastpath
      2132 ± 11%     +21.8%       2597 ± 12%  cpuidle.C1-NHM.usage
        77 ±  9%     -15.4%         65 ± 10%  sched_debug.cfs_rq[1]:/.runnable_load_avg
       100 ± 13%     +17.4%        118 ± 10%  sched_debug.cpu#6.cpu_load[1]
       101 ± 14%     +17.8%        119 ± 10%  sched_debug.cpu#6.cpu_load[2]
        85 ± 10%     -19.7%         68 ±  8%  sched_debug.cpu#1.load
     38.14 ±  0%     +13.0%      43.08 ±  0%  perf-profile.cpu-cycles.do_futex.sys_futex.system_call_fastpath.syscall
    272.17 ±  0%      -9.3%     246.76 ±  0%  time.user_time
      3.24 ±  4%     -12.3%       2.84 ±  2%  perf-profile.cpu-cycles.testcase
     43.30 ±  0%     +10.3%      47.76 ±  0%  perf-profile.cpu-cycles.sys_futex.system_call_fastpath.syscall
      3152 ±  6%     -12.5%       2758 ±  8%  sched_debug.cpu#2.curr->pid
        74 ±  4%     -13.5%         64 ±  7%  sched_debug.cpu#1.cpu_load[0]
     11.00 ±  2%     -10.8%       9.81 ±  1%  perf-profile.cpu-cycles.system_call_after_swapgs.syscall
     10.10 ±  1%     -14.3%       8.66 ±  1%  perf-profile.cpu-cycles.system_call.syscall
    729331 ±  4%     +11.8%     815471 ±  4%  sched_debug.cfs_rq[6]:/.min_vruntime
    110881 ± 10%     +13.5%     125833 ±  3%  sched_debug.cfs_rq[6]:/.exec_clock
      3.26 ±  0%     -12.7%       2.85 ±  1%  perf-profile.cpu-cycles.sysret_check.syscall
    112231 ± 10%     +13.1%     126884 ±  3%  sched_debug.cpu#6.nr_load_updates
        69 ±  3%     -12.3%         60 ±  8%  sched_debug.cpu#2.cpu_load[0]
       693 ±  6%     -10.9%        617 ±  2%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
     31740 ±  6%     -10.5%      28410 ±  2%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
    566208 ±  7%     -10.6%     505972 ±  2%  sched_debug.cfs_rq[0]:/.min_vruntime

testbox/testcase/testparams: lkp-snb01/will-it-scale/performance-futex3

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
  11940878 ±  0%     -17.1%    9902263 ±  0%  will-it-scale.per_thread_ops
  11923215 ±  0%     -17.3%    9861898 ±  0%  will-it-scale.per_process_ops
      0.61 ±  0%     +12.0%       0.68 ±  0%  will-it-scale.scalability
      0.85 ±  1%   +1283.8%      11.73 ±  0%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wake.do_futex.sys_futex.system_call_fastpath
       235 ± 47%    +154.9%        600 ± 26%  sched_debug.cfs_rq[25]:/.blocked_load_avg
       162 ± 19%     -40.2%         97 ± 40%  sched_debug.cfs_rq[22]:/.blocked_load_avg
       272 ± 35%    +130.6%        627 ± 26%  sched_debug.cfs_rq[25]:/.tg_load_contrib
       900 ± 31%     -62.6%        337 ± 46%  sched_debug.cpu#29.ttwu_local
       192 ± 16%     -33.8%        127 ± 31%  sched_debug.cfs_rq[22]:/.tg_load_contrib
       354 ± 15%    +179.4%        991 ± 40%  sched_debug.cpu#22.ttwu_count
       181 ± 43%     -49.9%         90 ±  8%  sched_debug.cpu#19.ttwu_local
      1491 ± 37%     -35.8%        958 ± 24%  sched_debug.cpu#26.sched_goidle
       524 ± 45%     -48.3%        271 ± 18%  sched_debug.cfs_rq[26]:/.tg_load_contrib
    312620 ± 45%     +88.4%     588988 ± 18%  sched_debug.cfs_rq[16]:/.spread0
       490 ± 49%     -50.2%        244 ± 14%  sched_debug.cfs_rq[26]:/.blocked_load_avg
      3749 ± 27%     -39.5%       2268 ± 27%  sched_debug.cpu#29.nr_switches
     14.06 ±  0%     +99.3%      28.02 ±  0%  perf-profile.cpu-cycles.futex_wake.do_futex.sys_futex.system_call_fastpath.syscall
      4096 ± 24%     -38.9%       2502 ± 18%  sched_debug.cpu#3.nr_switches
      1517 ± 19%     -26.3%       1118 ± 18%  sched_debug.cpu#3.sched_goidle
      3955 ± 18%     -18.5%       3225 ±  8%  sched_debug.cpu#28.curr->pid
      6.22 ±  1%     +53.6%       9.55 ±  0%  perf-profile.cpu-cycles.get_futex_key.futex_wake.do_futex.sys_futex.system_call_fastpath
        51 ± 11%     -20.0%         41 ±  3%  sched_debug.cpu#0.load
        51 ± 11%     -20.0%         41 ±  3%  sched_debug.cfs_rq[0]:/.load
      3865 ± 12%     +29.7%       5013 ± 19%  sched_debug.cpu#15.sched_goidle
     82552 ±  6%     -17.5%      68090 ±  3%  sched_debug.cpu#0.nr_load_updates
      1.24 ±  3%     -19.5%       1.00 ±  2%  perf-profile.cpu-cycles.drop_futex_key_refs.isra.12.do_futex.sys_futex.system_call_fastpath.syscall
     36.98 ±  1%     +32.7%      49.09 ±  0%  perf-profile.cpu-cycles.do_futex.sys_futex.system_call_fastpath.syscall
     43.52 ±  1%     +25.0%      54.42 ±  0%  perf-profile.cpu-cycles.system_call_fastpath.syscall
        54 ±  8%     +16.6%         63 ±  8%  sched_debug.cpu#16.cpu_load[1]
        54 ±  6%     +19.8%         65 ± 10%  sched_debug.cpu#16.cpu_load[2]
    666.39 ±  0%     -14.9%     566.95 ±  0%  time.user_time
        54 ±  5%     +23.6%         66 ± 14%  sched_debug.cpu#16.cpu_load[4]
      4.59 ±  1%     -20.6%       3.65 ±  1%  perf-profile.cpu-cycles.testcase
      4478 ±  5%     -12.4%       3921 ±  6%  sched_debug.cpu#0.curr->pid
     41.87 ±  1%     +26.7%      53.06 ±  0%  perf-profile.cpu-cycles.sys_futex.system_call_fastpath.syscall
     11.37 ±  1%     -19.8%       9.12 ±  0%  perf-profile.cpu-cycles.system_call_after_swapgs.syscall
     13.93 ±  2%     -18.7%      11.32 ±  1%  perf-profile.cpu-cycles.system_call.syscall
      2.46 ±  4%     -17.3%       2.04 ±  3%  perf-profile.cpu-cycles.sysret_check.syscall
     83964 ±  6%     -18.1%      68727 ±  4%  sched_debug.cfs_rq[0]:/.exec_clock
        54 ±  6%     +22.7%         66 ± 13%  sched_debug.cpu#16.cpu_load[3]
    102941 ±  4%     +15.5%     118874 ±  2%  sched_debug.cfs_rq[16]:/.exec_clock
       645 ±  5%     -13.2%        560 ±  4%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
     29634 ±  5%     -13.2%      25712 ±  4%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
   1087374 ±  6%     -13.1%     944559 ±  6%  sched_debug.cfs_rq[0]:/.min_vruntime
      2464 ±  6%      -9.0%       2243 ±  2%  numa-meminfo.node1.KernelStack
    110191 ±  4%     +12.1%     123516 ±  2%  sched_debug.cpu#16.nr_load_updates
     34745 ±  4%     +11.5%      38751 ±  3%  sched_debug.cfs_rq[16]:/.avg->runnable_avg_sum
       757 ±  4%     +11.5%        844 ±  3%  sched_debug.cfs_rq[16]:/.tg_runnable_contrib

testbox/testcase/testparams: lkp-sbx04/will-it-scale/performance-futex3

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
  11423732 ±  0%     -17.8%    9387203 ±  0%  will-it-scale.per_thread_ops
  11419511 ±  0%     -18.0%    9368303 ±  0%  will-it-scale.per_process_ops
      0.60 ±  0%     +11.9%       0.68 ±  0%  will-it-scale.scalability
      0.84 ±  3%   +1303.0%      11.82 ±  0%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wake.do_futex.sys_futex.system_call_fastpath
        68 ± 41%    +125.1%        154 ± 37%  sched_debug.cfs_rq[31]:/.blocked_load_avg
        76 ± 38%    +111.5%        161 ± 35%  sched_debug.cfs_rq[31]:/.tg_load_contrib
      1603 ± 47%     -53.1%        751 ± 35%  sched_debug.cpu#6.ttwu_count
       937 ± 13%     +84.1%       1726 ± 41%  sched_debug.cpu#30.ttwu_local
      3733 ± 12%     +37.0%       5117 ± 23%  sched_debug.cpu#14.sched_count
       131 ± 17%    +146.6%        324 ± 22%  sched_debug.cpu#47.ttwu_local
        96 ± 12%    +106.2%        199 ± 35%  sched_debug.cpu#33.ttwu_local
       199 ± 32%     -50.1%         99 ± 16%  sched_debug.cfs_rq[53]:/.blocked_load_avg
       210 ± 31%     -48.2%        109 ± 12%  sched_debug.cfs_rq[53]:/.tg_load_contrib
      2354 ± 19%     -41.9%       1368 ± 33%  sched_debug.cpu#48.nr_switches
        47 ± 44%    +131.9%        110 ± 43%  sched_debug.cfs_rq[33]:/.blocked_load_avg
        11 ± 36%     -45.7%          6 ±  6%  sched_debug.cpu#31.load
      2824 ± 19%     -37.3%       1771 ± 29%  sched_debug.cpu#63.nr_switches
         7 ±  5%     +61.3%         12 ± 20%  sched_debug.cpu#62.cpu_load[0]
      2846 ± 18%     -37.4%       1783 ± 29%  sched_debug.cpu#63.sched_count
       183 ± 21%     -52.3%         87 ± 33%  sched_debug.cfs_rq[46]:/.blocked_load_avg
        70 ± 30%     +89.3%        132 ± 36%  sched_debug.cfs_rq[33]:/.tg_load_contrib
      1064 ± 10%     -35.2%        689 ± 25%  sched_debug.cpu#63.sched_goidle
      5806 ± 11%     +23.7%       7184 ± 18%  sched_debug.cpu#28.nr_switches
     14.28 ±  0%     +97.8%      28.25 ±  0%  perf-profile.cpu-cycles.futex_wake.do_futex.sys_futex.system_call_fastpath.syscall
       198 ± 20%     -48.4%        102 ± 28%  sched_debug.cfs_rq[46]:/.tg_load_contrib
       501 ±  5%     -49.9%        251 ± 47%  sched_debug.cpu#40.ttwu_local
         6 ±  6%     +40.0%          8 ± 14%  sched_debug.cpu#25.cpu_load[1]
       803 ± 26%     -33.1%        538 ± 12%  sched_debug.cpu#53.ttwu_count
        11 ± 28%     +48.9%         17 ± 15%  sched_debug.cpu#51.cpu_load[0]
        83 ± 17%     +42.7%        119 ± 26%  sched_debug.cpu#39.ttwu_local
         6 ±  6%     +32.0%          8 ± 15%  sched_debug.cpu#25.cpu_load[2]
        22 ± 23%     -30.0%         15 ± 12%  sched_debug.cpu#45.cpu_load[1]
      2528 ±  7%     -19.1%       2045 ± 17%  sched_debug.cpu#18.sched_goidle
      1219 ± 22%     +39.0%       1695 ± 19%  sched_debug.cpu#47.sched_count
      1208 ± 22%     +27.3%       1538 ±  5%  sched_debug.cpu#47.nr_switches
       944 ± 13%     -32.7%        635 ± 15%  sched_debug.cpu#40.ttwu_count
        19 ± 12%     -23.4%         14 ±  5%  sched_debug.cpu#45.cpu_load[2]
      2946 ± 10%     +18.9%       3502 ± 12%  sched_debug.cpu#61.curr->pid
       561 ± 10%     +35.3%        759 ±  5%  sched_debug.cpu#47.ttwu_count
      6.28 ±  0%     +54.7%       9.71 ±  1%  perf-profile.cpu-cycles.get_futex_key.futex_wake.do_futex.sys_futex.system_call_fastpath
         6 ±  0%     +29.2%          7 ± 10%  sched_debug.cpu#25.cpu_load[3]
       531 ± 12%     +31.8%        700 ±  5%  sched_debug.cpu#44.ttwu_count
      1.27 ±  1%     -22.1%       0.99 ±  1%  perf-profile.cpu-cycles.drop_futex_key_refs.isra.12.do_futex.sys_futex.system_call_fastpath.syscall
     37.44 ±  0%     +32.2%      49.50 ±  0%  perf-profile.cpu-cycles.do_futex.sys_futex.system_call_fastpath.syscall
      1004 ±  5%     +24.3%       1247 ±  6%  numa-meminfo.node3.PageTables
       253 ±  6%     +23.2%        312 ±  6%  numa-vmstat.node3.nr_page_table_pages
     44.13 ±  0%     +24.3%      54.84 ±  0%  perf-profile.cpu-cycles.system_call_fastpath.syscall
      1967 ±  7%     -12.4%       1722 ±  5%  numa-meminfo.node2.KernelStack
      3710 ±  4%     -14.0%       3191 ±  1%  sched_debug.cpu#31.curr->pid
   1141.59 ±  0%     -16.2%     956.67 ±  0%  time.user_time
      4.56 ±  0%     -18.7%       3.71 ±  2%  perf-profile.cpu-cycles.testcase
     42.42 ±  0%     +26.2%      53.51 ±  0%  perf-profile.cpu-cycles.sys_futex.system_call_fastpath.syscall
    436335 ±  2%      -9.8%     393521 ±  1%  softirqs.RCU
      3339 ±  2%     +12.7%       3764 ±  9%  sched_debug.cpu#54.curr->pid
    973122 ±  0%      +9.1%    1062051 ±  8%  sched_debug.cfs_rq[53]:/.min_vruntime
     11.40 ±  0%     -19.5%       9.18 ±  0%  perf-profile.cpu-cycles.system_call_after_swapgs.syscall
     14.13 ±  0%     -19.8%      11.34 ±  0%  perf-profile.cpu-cycles.system_call.syscall
      2.46 ±  0%     -19.6%       1.98 ±  2%  perf-profile.cpu-cycles.sysret_check.syscall
      1820 ± 13%     -29.1%       1290 ± 22%  sched_debug.cpu#40.nr_switches
         8 ±  0%     +12.5%          9 ±  0%  sched_debug.cpu#16.cpu_load[3]
      1630 ± 14%     +38.0%       2250 ± 12%  sched_debug.cpu#13.sched_goidle
     12601 ±  7%     -11.4%      11165 ±  3%  numa-meminfo.node0.SReclaimable
      3149 ±  7%     -11.4%       2791 ±  3%  numa-vmstat.node0.nr_slab_reclaimable
      9845 ±  9%     +15.7%      11388 ±  7%  numa-meminfo.node1.SReclaimable
      2460 ±  9%     +15.7%       2846 ±  7%  numa-vmstat.node1.nr_slab_reclaimable
      1410 ±  5%      +9.6%       1545 ±  4%  slabinfo.mm_struct.active_objs
      3055 ± 12%     +14.1%       3484 ±  8%  sched_debug.cpu#26.curr->pid
      1410 ±  5%      +9.6%       1545 ±  4%  slabinfo.mm_struct.num_objs
      5757 ±  7%     +13.3%       6521 ±  6%  numa-vmstat.node3.nr_slab_unreclaimable
     23031 ±  7%     +13.3%      26086 ±  6%  numa-meminfo.node3.SUnreclaim

testbox/testcase/testparams: lkp-snb01/will-it-scale/performance-futex4

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
   7502355 ±  0%     -11.5%    6637575 ±  1%  will-it-scale.per_thread_ops
   7513211 ±  0%     -10.9%    6692022 ±  0%  will-it-scale.per_process_ops
      0.65 ±  0%      +3.9%       0.68 ±  0%  will-it-scale.scalability
      0.53 ±  3%   +1423.7%       8.04 ±  1%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wait_setup.futex_wait.do_futex.sys_futex
      1699 ± 37%     -68.6%        533 ± 35%  sched_debug.cpu#25.ttwu_local
       513 ± 39%     -50.7%        253 ± 38%  sched_debug.cfs_rq[27]:/.tg_load_contrib
      2846 ± 28%     -36.7%       1801 ±  5%  sched_debug.cpu#25.ttwu_count
        33 ± 45%     -54.9%         15 ± 18%  sched_debug.cpu#10.load
       450 ± 26%    +165.3%       1195 ± 45%  sched_debug.cpu#23.sched_goidle
        18 ±  8%     +81.1%         33 ± 38%  sched_debug.cpu#29.cpu_load[2]
      5401 ± 26%     -49.1%       2748 ±  8%  sched_debug.cpu#25.nr_switches
        17 ± 14%     +71.4%         30 ± 40%  sched_debug.cfs_rq[14]:/.load
        37 ± 23%     -50.7%         18 ± 17%  sched_debug.cpu#26.cpu_load[0]
       493 ± 41%     -52.4%        234 ± 42%  sched_debug.cfs_rq[27]:/.blocked_load_avg
        17 ±  4%     +51.4%         26 ± 27%  sched_debug.cpu#29.cpu_load[3]
      2116 ± 32%     +87.3%       3963 ± 44%  sched_debug.cpu#28.nr_switches
       752 ± 11%     +78.1%       1340 ± 32%  sched_debug.cpu#28.sched_goidle
     11262 ± 14%     -25.6%       8382 ± 20%  sched_debug.cpu#10.nr_switches
      3179 ±  4%     +35.0%       4292 ± 19%  sched_debug.cpu#29.curr->pid
      4945 ± 14%     -25.4%       3687 ± 21%  sched_debug.cpu#10.sched_goidle
      4.60 ±  1%     +56.9%       7.21 ±  0%  perf-profile.cpu-cycles.get_futex_key.futex_wait_setup.futex_wait.do_futex.sys_futex
        16 ±  2%     +33.8%         21 ± 19%  sched_debug.cpu#29.cpu_load[4]
        22 ± 27%     -36.4%         14 ± 14%  sched_debug.cpu#10.cpu_load[0]
      1676 ±  5%     -18.0%       1374 ± 15%  numa-meminfo.node0.PageTables
       418 ±  5%     -17.8%        343 ± 15%  numa-vmstat.node0.nr_page_table_pages
        16 ±  2%     +24.6%         20 ±  9%  sched_debug.cpu#13.load
        16 ±  0%     +25.0%         20 ±  9%  sched_debug.cfs_rq[13]:/.load
        16 ±  0%     +26.6%         20 ±  9%  sched_debug.cpu#13.cpu_load[0]
      3345 ± 14%     +27.5%       4266 ±  8%  sched_debug.cpu#13.ttwu_count
     10925 ±  4%     -15.6%       9218 ±  3%  slabinfo.kmalloc-256.active_objs
     31.80 ±  1%     +24.9%      39.74 ±  0%  perf-profile.cpu-cycles.futex_wait_setup.futex_wait.do_futex.sys_futex.system_call_fastpath
     11691 ±  4%     -14.7%       9976 ±  3%  slabinfo.kmalloc-256.num_objs
    465.75 ±  0%     -11.2%     413.51 ±  0%  time.user_time
        16 ±  0%     +15.6%         18 ±  6%  sched_debug.cpu#13.cpu_load[1]
      7.28 ±  1%     -11.7%       6.43 ±  2%  perf-profile.cpu-cycles.system_call_after_swapgs.syscall
      8.14 ±  0%     -11.1%       7.23 ±  1%  perf-profile.cpu-cycles.system_call.syscall
      1.77 ±  3%     -10.1%       1.59 ±  2%  perf-profile.cpu-cycles.sysret_check.syscall
      1.97 ±  2%      -8.5%       1.80 ±  2%  perf-profile.cpu-cycles._raw_spin_unlock.futex_wait_setup.futex_wait.do_futex.sys_futex
      3865 ± 13%     +15.8%       4474 ±  3%  numa-vmstat.node1.nr_anon_pages
     15441 ± 13%     +15.9%      17889 ±  3%  numa-meminfo.node1.AnonPages
      6.95 ±  2%     -11.4%       6.16 ±  0%  perf-profile.cpu-cycles._raw_spin_lock.futex_wait_setup.futex_wait.do_futex.sys_futex
     17320 ± 12%     -14.0%      14897 ±  3%  numa-meminfo.node0.Active(anon)
      4329 ± 12%     -14.0%       3723 ±  3%  numa-vmstat.node0.nr_active_anon
        16 ±  0%     +12.5%         18 ±  6%  sched_debug.cpu#12.cpu_load[4]
     12.79 ±  1%      -9.9%      11.52 ±  0%  perf-profile.cpu-cycles.hash_futex.futex_wait.do_futex.sys_futex.system_call_fastpath
     16945 ± 12%     -14.2%      14539 ±  3%  numa-meminfo.node0.AnonPages
      4235 ± 12%     -14.2%       3634 ±  3%  numa-vmstat.node0.nr_anon_pages

testbox/testcase/testparams: nhm4/will-it-scale/performance-futex4

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
   7653177 ±  0%      -6.5%    7158623 ±  0%  will-it-scale.per_thread_ops
   7616372 ±  0%      -6.5%    7119979 ±  0%  will-it-scale.per_process_ops
      0.70 ±  0%      +2.4%       0.71 ±  0%  will-it-scale.scalability
      1.31 ±  0%    +385.8%       6.37 ±  2%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wait_setup.futex_wait.do_futex.sys_futex
       847 ± 22%     -39.7%        510 ± 21%  sched_debug.cfs_rq[3]:/.blocked_load_avg
       979 ± 18%     -34.4%        642 ± 16%  sched_debug.cfs_rq[3]:/.tg_load_contrib
       139 ± 18%     -25.7%        103 ± 17%  sched_debug.cpu#6.load
      5.35 ±  2%     +25.7%       6.73 ±  2%  perf-profile.cpu-cycles.get_futex_key.futex_wait_setup.futex_wait.do_futex.sys_futex
       163 ± 11%     -17.2%        135 ± 11%  sched_debug.cpu#4.cpu_load[0]
     30.79 ±  1%     +15.4%      35.54 ±  1%  perf-profile.cpu-cycles.futex_wait_setup.futex_wait.do_futex.sys_futex.system_call_fastpath
    146.40 ±  2%     -36.7%      92.65 ±  2%  time.user_time
      2734 ±  5%     +10.7%       3027 ±  7%  sched_debug.cpu#0.curr->pid
      7.93 ±  2%      -8.3%       7.27 ±  3%  perf-profile.cpu-cycles.system_call_after_swapgs.syscall
      7.20 ±  2%     -10.7%       6.44 ±  1%  perf-profile.cpu-cycles.system_call.syscall
      2.28 ±  3%      -8.6%       2.08 ±  3%  perf-profile.cpu-cycles.sysret_check.syscall
     12.98 ±  1%     -10.2%      11.65 ±  2%  perf-profile.cpu-cycles.hash_futex.futex_wait.do_futex.sys_futex.system_call_fastpath

testbox/testcase/testparams: lkp-sbx04/will-it-scale/performance-futex4

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
   7139956 ±  0%     -11.0%    6357641 ±  0%  will-it-scale.per_thread_ops
   7161715 ±  0%     -11.3%    6351262 ±  0%  will-it-scale.per_process_ops
      0.64 ±  0%      +4.5%       0.67 ±  0%  will-it-scale.scalability
       379 ± 48%     -72.3%        105 ± 35%  sched_debug.cpu#37.ttwu_local
      0.53 ±  3%   +1395.3%       7.92 ±  1%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wait_setup.futex_wait.do_futex.sys_futex
       860 ± 22%     +44.8%       1245 ± 16%  sched_debug.cpu#17.ttwu_local
      1025 ± 43%     +95.4%       2003 ± 42%  sched_debug.cpu#32.sched_count
      1010 ± 43%     +96.8%       1989 ± 43%  sched_debug.cpu#32.nr_switches
       361 ± 28%     -46.7%        192 ± 19%  sched_debug.cfs_rq[60]:/.blocked_load_avg
       368 ± 27%     -45.8%        199 ± 20%  sched_debug.cfs_rq[60]:/.tg_load_contrib
       525 ± 20%     +63.2%        857 ± 30%  sched_debug.cpu#48.sched_goidle
      1292 ± 18%    +109.9%       2713 ± 37%  sched_debug.cpu#48.nr_switches
       375 ± 30%     +60.9%        603 ± 30%  sched_debug.cpu#32.sched_goidle
      1903 ± 17%     +35.2%       2572 ± 21%  sched_debug.cpu#23.ttwu_count
       888 ± 33%     +53.7%       1365 ± 21%  sched_debug.cpu#34.sched_count
       777 ± 21%     +52.7%       1186 ±  7%  sched_debug.cpu#21.ttwu_local
       504 ± 21%     +44.3%        727 ± 11%  sched_debug.cpu#55.sched_goidle
      1410 ± 16%     -46.8%        749 ± 11%  sched_debug.cpu#60.ttwu_count
       622 ± 21%     -44.8%        343 ± 17%  sched_debug.cpu#47.sched_goidle
       148 ± 36%     +58.8%        235 ± 22%  sched_debug.cfs_rq[49]:/.blocked_load_avg
       878 ± 33%     +54.0%       1352 ± 21%  sched_debug.cpu#34.nr_switches
       645 ±  8%     +98.8%       1283 ± 42%  sched_debug.cpu#48.ttwu_count
       157 ± 34%     +56.0%        245 ± 21%  sched_debug.cfs_rq[49]:/.tg_load_contrib
        91 ± 34%     +89.1%        173 ± 37%  sched_debug.cfs_rq[41]:/.blocked_load_avg
      1211 ± 20%     +53.1%       1855 ± 12%  sched_debug.cpu#55.nr_switches
       109 ± 26%     +74.7%        191 ± 34%  sched_debug.cfs_rq[41]:/.tg_load_contrib
      2418 ± 25%     +62.7%       3934 ± 17%  numa-vmstat.node3.nr_active_anon
      9680 ± 25%     +62.6%      15742 ± 17%  numa-meminfo.node3.Active(anon)
       744 ± 32%     -36.1%        475 ± 19%  sched_debug.cpu#45.ttwu_count
       546 ± 19%     +34.6%        735 ± 14%  sched_debug.cpu#61.sched_goidle
        11 ± 14%     -25.0%          8 ± 13%  sched_debug.cpu#54.load
      1912 ±  7%     +48.1%       2831 ± 10%  sched_debug.cpu#17.ttwu_count
      4.64 ±  0%     +56.4%       7.26 ±  1%  perf-profile.cpu-cycles.get_futex_key.futex_wait_setup.futex_wait.do_futex.sys_futex
        14 ±  8%     +28.8%         19 ±  8%  sched_debug.cpu#46.cpu_load[0]
      4908 ±  6%     +34.3%       6593 ±  8%  sched_debug.cpu#17.nr_switches
      2287 ±  3%     +29.7%       2965 ±  6%  sched_debug.cpu#17.sched_goidle
  66715375 ± 27%     -38.5%   41040216 ± 25%  cpuidle.C1-SNB.time
      1866 ±  7%     -29.1%       1323 ±  9%  sched_debug.cpu#60.nr_switches
      1877 ±  7%     -21.7%       1470 ± 12%  sched_debug.cpu#60.sched_count
      3412 ±  3%     +34.1%       4577 ±  5%  numa-vmstat.node2.nr_anon_pages
     13651 ±  3%     +34.1%      18306 ±  5%  numa-meminfo.node2.AnonPages
      1859 ± 10%     +52.1%       2828 ± 21%  sched_debug.cpu#62.sched_count
     30187 ±  9%     +22.8%      37073 ±  8%  numa-meminfo.node3.Active
       199 ±  2%     +27.4%        253 ±  7%  sched_debug.cpu#35.ttwu_count
       307 ± 27%     -28.0%        221 ± 17%  sched_debug.cfs_rq[56]:/.tg_load_contrib
     13213 ±  6%     -14.4%      11313 ± 14%  numa-meminfo.node2.SReclaimable
      3302 ±  6%     -14.4%       2828 ± 14%  numa-vmstat.node2.nr_slab_reclaimable
     32.22 ±  0%     +24.4%      40.07 ±  1%  perf-profile.cpu-cycles.futex_wait_setup.futex_wait.do_futex.sys_futex.system_call_fastpath
    794.01 ±  0%     -11.4%     703.26 ±  0%  time.user_time
     27693 ±  4%     +11.5%      30884 ±  3%  numa-vmstat.node3.nr_file_pages
    110774 ±  4%     +11.5%     123540 ±  3%  numa-meminfo.node3.FilePages
      7.30 ±  0%     -14.6%       6.24 ±  0%  perf-profile.cpu-cycles.system_call_after_swapgs.syscall
    122283 ±  3%     -10.2%     109866 ±  3%  numa-meminfo.node2.FilePages
     30570 ±  3%     -10.2%      27466 ±  3%  numa-vmstat.node2.nr_file_pages
      1.78 ±  2%     -13.1%       1.55 ±  3%  perf-profile.cpu-cycles.sysret_check.syscall
      2.04 ±  3%     -12.0%       1.79 ±  1%  perf-profile.cpu-cycles._raw_spin_unlock.futex_wait_setup.futex_wait.do_futex.sys_futex
      7.00 ±  0%     -12.9%       6.09 ±  2%  perf-profile.cpu-cycles._raw_spin_lock.futex_wait_setup.futex_wait.do_futex.sys_futex
        14 ±  5%     +15.3%         17 ±  4%  sched_debug.cpu#46.cpu_load[1]
      1363 ±  2%     +10.6%       1507 ±  3%  slabinfo.mm_struct.active_objs
     12.81 ±  1%     -12.0%      11.27 ±  1%  perf-profile.cpu-cycles.hash_futex.futex_wait.do_futex.sys_futex.system_call_fastpath

testbox/testcase/testparams: lkp-g5/will-it-scale/performance-futex3

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
   8319207 ±  0%      -9.9%    7498648 ±  0%  will-it-scale.per_thread_ops
   8330959 ±  0%      -9.9%    7503988 ±  0%  will-it-scale.per_process_ops
      1800 ±  0%     -10.3%       1615 ±  0%  will-it-scale.time.user_time
      0.58 ±  0%      +7.4%       0.63 ±  0%  will-it-scale.scalability
      5374 ±  0%      +3.4%       5557 ±  0%  will-it-scale.time.system_time
      2.37 ± 11%    +335.3%      10.30 ± 10%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wake.do_futex.sys_futex.system_call_fastpath
        13 ± 31%    +250.0%         45 ± 44%  sched_debug.cfs_rq[13]:/.tg_load_contrib
       909 ± 46%    +124.5%       2041 ± 14%  sched_debug.cpu#65.ttwu_local
       472 ± 12%    +184.0%       1340 ± 49%  sched_debug.cpu#2.sched_goidle
      1642 ± 19%     -55.8%        726 ± 43%  sched_debug.cpu#27.ttwu_local
      2326 ± 21%     -70.9%        676 ± 38%  sched_debug.cpu#32.ttwu_local
       351 ± 23%     +81.6%        637 ± 35%  sched_debug.cpu#23.ttwu_local
      1674 ± 47%    +151.0%       4203 ± 33%  numa-meminfo.node6.Active(anon)
       418 ± 48%    +151.3%       1050 ± 33%  numa-vmstat.node6.nr_active_anon
      1754 ± 46%    +144.9%       4297 ± 33%  numa-meminfo.node6.AnonPages
       438 ± 46%    +145.0%       1073 ± 33%  numa-vmstat.node6.nr_anon_pages
      4561 ± 27%     -70.9%       1328 ± 27%  sched_debug.cpu#32.ttwu_count
      1261 ± 15%    +145.4%       3094 ± 46%  sched_debug.cpu#2.nr_switches
         1 ± 35%     +87.5%          2 ± 20%  sched_debug.cfs_rq[28]:/.nr_spread_over
      2285 ± 23%     +38.1%       3155 ± 27%  sched_debug.cpu#23.nr_switches
      1621 ±  2%     -47.2%        855 ± 41%  sched_debug.cpu#29.ttwu_local
        10 ±  4%     +86.3%         19 ± 33%  sched_debug.cfs_rq[3]:/.tg_load_contrib
       351 ± 30%     +64.5%        578 ± 31%  sched_debug.cpu#24.ttwu_local
      2931 ± 11%     -48.9%       1499 ± 22%  sched_debug.cpu#27.ttwu_count
      1044 ± 24%     +42.4%       1486 ± 28%  sched_debug.cpu#23.sched_goidle
      8008 ±  6%     -57.6%       3395 ± 25%  sched_debug.cpu#32.nr_switches
       362 ± 30%     +77.4%        643 ± 34%  sched_debug.cpu#20.ttwu_local
        21 ± 27%     +66.7%         35 ± 26%  sched_debug.cfs_rq[100]:/.tg_load_contrib
      3168 ± 18%     -50.3%       1573 ± 34%  sched_debug.cpu#29.ttwu_count
       426 ± 41%     +57.4%        671 ± 29%  sched_debug.cpu#34.sched_goidle
        18 ± 37%    +114.3%         40 ± 25%  sched_debug.cfs_rq[1]:/.tg_load_contrib
       985 ± 30%     +56.2%       1538 ± 19%  sched_debug.cpu#22.ttwu_count
       945 ± 28%     +56.7%       1482 ± 32%  sched_debug.cpu#21.ttwu_count
        22 ± 38%     +57.7%         35 ± 36%  sched_debug.cfs_rq[103]:/.tg_load_contrib
       389 ± 18%     +89.7%        738 ±  8%  sched_debug.cpu#87.ttwu_local
        32 ± 41%     +74.7%         56 ± 26%  sched_debug.cfs_rq[64]:/.blocked_load_avg
       887 ± 24%     +59.4%       1413 ± 32%  sched_debug.cpu#20.ttwu_count
      6242 ±  9%     -42.9%       3562 ± 17%  sched_debug.cpu#27.nr_switches
      3293 ± 18%     -54.7%       1493 ± 25%  sched_debug.cpu#32.sched_goidle
        37 ± 36%     +70.9%         63 ± 19%  sched_debug.cfs_rq[64]:/.tg_load_contrib
       759 ± 18%     +75.3%       1330 ± 18%  sched_debug.cpu#87.ttwu_count
       919 ± 28%     +54.2%       1416 ± 32%  sched_debug.cpu#23.ttwu_count
       183 ± 35%     -65.8%         62 ± 38%  sched_debug.cfs_rq[121]:/.blocked_load_avg
       185 ± 34%     -64.6%         65 ± 35%  sched_debug.cfs_rq[121]:/.tg_load_contrib
       355 ± 25%     +74.0%        618 ± 35%  sched_debug.cpu#21.ttwu_local
       538 ± 17%     +66.1%        893 ± 33%  sched_debug.cpu#111.sched_count
      1784 ±  6%     +48.9%       2657 ± 27%  sched_debug.cpu#2.ttwu_count
      1986 ±  5%     +70.2%       3379 ± 14%  sched_debug.cpu#65.sched_goidle
      2955 ±  9%     -45.5%       1609 ± 18%  sched_debug.cpu#27.sched_goidle
      1149 ± 38%     +56.4%       1796 ± 24%  sched_debug.cpu#34.nr_switches
     23615 ± 21%     -46.6%      12602 ± 24%  sched_debug.cpu#88.sched_count
       703 ± 18%     -24.4%        532 ± 12%  sched_debug.cpu#48.ttwu_count
      6768 ± 15%     -37.6%       4221 ± 20%  sched_debug.cpu#29.nr_switches
      4471 ±  7%     +66.1%       7429 ± 13%  sched_debug.cpu#65.nr_switches
    644953 ±  7%     -44.9%     355582 ± 21%  sched_debug.cfs_rq[94]:/.min_vruntime
       728 ±  8%     +11.1%        809 ±  9%  sched_debug.cpu#55.nr_switches
       235 ± 11%     +60.7%        378 ± 32%  sched_debug.cpu#115.ttwu_count
      2547 ±  8%     +34.0%       3411 ± 23%  sched_debug.cpu#18.nr_switches
       268 ± 14%     +38.1%        370 ± 23%  sched_debug.cpu#102.sched_goidle
      1480 ±  5%     +59.4%       2358 ± 24%  numa-vmstat.node3.nr_slab_reclaimable
      5921 ±  5%     +59.4%       9436 ± 24%  numa-meminfo.node3.SReclaimable
       529 ± 17%     +32.9%        703 ± 19%  sched_debug.cpu#111.nr_switches
         8 ± 19%     +55.8%         13 ± 21%  sched_debug.cpu#41.cpu_load[0]
       845 ±  8%     +50.6%       1272 ±  4%  sched_debug.cpu#87.sched_goidle
      3208 ± 15%     -40.0%       1924 ± 23%  sched_debug.cpu#29.sched_goidle
      1836 ±  5%     +52.4%       2798 ± 11%  sched_debug.cpu#87.nr_switches
      2424 ± 31%     +43.9%       3489 ± 11%  sched_debug.cpu#65.ttwu_count
       167 ± 16%     +33.1%        222 ± 17%  sched_debug.cpu#103.ttwu_count
       916 ± 15%     +50.5%       1379 ± 15%  sched_debug.cpu#84.ttwu_count
       161 ±  8%     +41.6%        229 ± 14%  sched_debug.cpu#51.ttwu_local
     12493 ± 19%     -31.2%       8594 ± 14%  sched_debug.cfs_rq[122]:/.exec_clock
       153 ±  3%     +23.9%        190 ± 19%  sched_debug.cpu#55.ttwu_local
      9141 ±  4%     +29.4%      11831 ± 10%  sched_debug.cfs_rq[126]:/.avg->runnable_avg_sum
       230 ± 17%     +31.1%        302 ± 18%  sched_debug.cpu#111.sched_goidle
       198 ±  3%     +29.7%        256 ± 10%  sched_debug.cfs_rq[126]:/.tg_runnable_contrib
    652664 ± 21%     -36.4%     415054 ± 15%  meminfo.Committed_AS
      1196 ±  8%     +34.2%       1606 ± 24%  sched_debug.cpu#18.sched_goidle
      1583 ±  8%     +26.2%       1998 ± 12%  sched_debug.cpu#80.ttwu_count
        11 ± 12%     -34.1%          7 ± 15%  sched_debug.cfs_rq[9]:/.load
        11 ± 19%     -34.1%          7 ± 15%  sched_debug.cpu#9.cpu_load[0]
        11 ± 12%     -34.1%          7 ± 15%  sched_debug.cpu#9.load
      3032 ±  5%     +32.3%       4013 ± 11%  sched_debug.cpu#126.curr->pid
      6039 ± 49%     -49.8%       3033 ± 13%  sched_debug.cpu#90.ttwu_count
       442 ±  9%     -34.9%        288 ± 11%  sched_debug.cpu#62.ttwu_local
       954 ± 31%     +68.0%       1604 ± 46%  sched_debug.cpu#81.ttwu_count
       251 ± 19%     +39.5%        350 ± 28%  sched_debug.cpu#36.sched_goidle
       778 ± 13%     +25.7%        978 ± 10%  sched_debug.cpu#53.nr_switches
       406 ±  9%     -35.1%        263 ±  9%  sched_debug.cpu#41.ttwu_local
     12511 ± 15%     +18.7%      14847 ± 13%  numa-meminfo.node6.Active
       646 ± 14%     +38.6%        896 ± 27%  sched_debug.cpu#115.nr_switches
      1802 ± 15%     +39.0%       2505 ± 13%  sched_debug.cpu#86.nr_switches
       838 ± 16%     +38.2%       1159 ± 12%  sched_debug.cpu#86.sched_goidle
       655 ± 14%     +38.1%        904 ± 27%  sched_debug.cpu#115.sched_count
       660 ± 14%     +51.7%       1001 ± 42%  sched_debug.cpu#119.sched_count
         8 ± 14%     +38.0%         11 ± 19%  sched_debug.cpu#41.cpu_load[1]
       527 ± 16%     -34.3%        346 ± 18%  sched_debug.cpu#96.ttwu_local
       354 ± 22%     +60.8%        569 ± 22%  sched_debug.cpu#18.ttwu_local
         8 ± 10%     -25.0%          6 ±  0%  sched_debug.cpu#17.cpu_load[0]
         8 ± 10%     -25.0%          6 ±  0%  sched_debug.cfs_rq[17]:/.runnable_load_avg
         8 ±  5%     -28.0%          6 ±  0%  sched_debug.cpu#17.cpu_load[1]
         8 ±  5%     -28.0%          6 ±  0%  sched_debug.cpu#17.cpu_load[2]
         8 ± 10%     -25.0%          6 ±  0%  sched_debug.cpu#17.cpu_load[4]
         8 ± 10%     -25.0%          6 ±  0%  sched_debug.cfs_rq[17]:/.load
         8 ± 10%     -25.0%          6 ±  0%  sched_debug.cpu#17.cpu_load[3]
         8 ± 10%     -25.0%          6 ±  0%  sched_debug.cpu#17.load
     10464 ±  8%     -32.9%       7025 ± 17%  sched_debug.cfs_rq[94]:/.avg->runnable_avg_sum
       227 ±  8%     -33.1%        152 ± 17%  sched_debug.cfs_rq[94]:/.tg_runnable_contrib
      3279 ± 11%     +13.7%       3729 ± 11%  sched_debug.cpu#125.curr->pid
      2956 ±  4%     +17.5%       3474 ±  9%  sched_debug.cpu#127.curr->pid
       637 ±  9%     -25.4%        475 ± 18%  sched_debug.cpu#62.ttwu_count
      9187 ±  1%     +21.9%      11201 ± 10%  sched_debug.cfs_rq[127]:/.avg->runnable_avg_sum
         8 ±  5%     -19.2%          7 ± 10%  sched_debug.cpu#9.cpu_load[4]
       200 ±  1%     +21.4%        243 ± 10%  sched_debug.cfs_rq[127]:/.tg_runnable_contrib
       611 ±  8%     -16.8%        509 ±  6%  sched_debug.cpu#41.sched_goidle
     23669 ± 14%     +18.0%      27933 ± 10%  numa-meminfo.node3.Slab
    753533 ± 11%     +16.9%     880591 ± 14%  sched_debug.cfs_rq[125]:/.min_vruntime
    675885 ±  5%     +36.5%     922395 ±  8%  sched_debug.cfs_rq[126]:/.min_vruntime
      1557 ±  0%     -18.8%       1264 ±  6%  sched_debug.cpu#41.nr_switches
      1572 ±  0%     -18.8%       1277 ±  5%  sched_debug.cpu#41.sched_count
         9 ±  4%     -22.4%          7 ± 14%  sched_debug.cpu#9.cpu_load[2]
        10 ±  8%     -25.0%          7 ± 14%  sched_debug.cpu#9.cpu_load[1]
       428 ± 19%     -27.8%        309 ± 15%  sched_debug.cpu#54.sched_goidle
      3838 ±  1%     -21.8%       3000 ±  5%  sched_debug.cpu#122.curr->pid
       772 ±  3%     +15.6%        893 ± 12%  numa-vmstat.node4.nr_alloc_batch
       266 ± 13%     +38.2%        368 ± 35%  sched_debug.cpu#119.sched_goidle
      1763 ±  2%     -16.5%       1472 ± 10%  numa-vmstat.node6.nr_slab_reclaimable
      7054 ±  2%     -16.5%       5889 ±  9%  numa-meminfo.node6.SReclaimable
   1800.43 ±  0%     -10.3%    1615.50 ±  0%  time.user_time
      1018 ± 17%     -23.4%        780 ± 11%  sched_debug.cpu#54.nr_switches
     10904 ±  6%     -15.2%       9246 ±  6%  sched_debug.cfs_rq[122]:/.avg->runnable_avg_sum
       237 ±  6%     -15.3%        201 ±  6%  sched_debug.cfs_rq[122]:/.tg_runnable_contrib
       461 ±  4%     +36.1%        627 ± 27%  sched_debug.cpu#43.sched_goidle
       488 ±  9%     -20.3%        389 ± 11%  sched_debug.cpu#58.ttwu_local
       974 ± 18%     -25.6%        725 ± 11%  numa-meminfo.node7.PageTables
     57552 ±  7%     +14.9%      66133 ±  6%  sched_debug.cfs_rq[69]:/.exec_clock
      3658 ±  6%     +15.7%       4230 ± 11%  sched_debug.cpu#80.nr_switches
       794 ±  6%      +9.3%        867 ±  7%  numa-vmstat.node0.nr_alloc_batch
       331 ± 14%     +26.6%        419 ± 12%  sched_debug.cpu#59.ttwu_local
     75174 ±  6%     -12.1%      66087 ±  6%  sched_debug.cfs_rq[101]:/.exec_clock
       164 ± 10%     +25.4%        206 ± 20%  sched_debug.cpu#114.ttwu_local
   1618315 ±  5%      +2.4%    1656815 ±  5%  sched_debug.cfs_rq[58]:/.min_vruntime

testbox/testcase/testparams: lkp-wsx01/will-it-scale/performance-futex3

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
   9211097 ±  0%     -10.2%    8268370 ±  0%  will-it-scale.per_thread_ops
   9204866 ±  0%     -10.2%    8266680 ±  0%  will-it-scale.per_process_ops
      0.63 ±  0%      +6.3%       0.66 ±  0%  will-it-scale.scalability
      2.01 ±  1%    +365.3%       9.35 ±  1%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wake.do_futex.sys_futex.system_call_fastpath
         1 ± 34%    +200.0%          4 ± 17%  sched_debug.cpu#63.nr_uninterruptible
       404 ± 23%    +143.3%        984 ± 27%  sched_debug.cpu#48.sched_count
       434 ± 32%     -68.0%        139 ± 20%  sched_debug.cpu#61.ttwu_local
      5335 ± 48%     -54.4%       2433 ± 43%  sched_debug.cpu#20.sched_goidle
       576 ± 26%     -46.8%        306 ± 34%  sched_debug.cpu#55.ttwu_count
       161 ± 27%    +159.0%        418 ± 29%  sched_debug.cpu#48.sched_goidle
     11216 ± 46%     -49.6%       5648 ± 38%  sched_debug.cpu#20.nr_switches
     11237 ± 46%     -49.5%       5669 ± 38%  sched_debug.cpu#20.sched_count
       395 ± 23%    +146.2%        972 ± 27%  sched_debug.cpu#48.nr_switches
      1461 ± 31%     -46.9%        776 ± 23%  sched_debug.cpu#42.ttwu_local
       926 ± 19%     -41.0%        546 ± 24%  sched_debug.cpu#61.ttwu_count
       856 ± 34%     -50.2%        426 ± 31%  sched_debug.cpu#58.ttwu_count
       498 ± 23%     -54.4%        227 ± 25%  sched_debug.cpu#55.sched_goidle
      3667 ± 25%     -38.3%       2263 ± 18%  sched_debug.cpu#42.nr_switches
      3680 ± 25%     -38.1%       2279 ± 18%  sched_debug.cpu#42.sched_count
      1593 ± 17%     -45.1%        875 ± 20%  sched_debug.cpu#61.nr_switches
       155 ± 36%    +182.5%        438 ± 40%  sched_debug.cpu#48.ttwu_count
      1604 ± 17%     -44.8%        885 ± 20%  sched_debug.cpu#61.sched_count
      1245 ± 27%     -52.5%        591 ± 21%  sched_debug.cpu#55.sched_count
       534 ± 11%     +63.0%        871 ± 35%  sched_debug.cpu#45.nr_switches
      1232 ± 27%     -52.8%        581 ± 21%  sched_debug.cpu#55.nr_switches
       545 ± 11%     +62.0%        883 ± 35%  sched_debug.cpu#45.sched_count
       402 ± 22%     +45.4%        585 ± 29%  sched_debug.cpu#70.sched_goidle
       595 ± 21%    +108.8%       1242 ± 40%  sched_debug.cpu#70.ttwu_count
       575 ± 24%     -39.7%        346 ± 21%  sched_debug.cpu#61.sched_goidle
        10 ± 26%     -44.0%          5 ± 18%  sched_debug.cpu#72.cpu_load[0]
      7.76 ±  0%     +15.3%       8.95 ±  0%  perf-profile.cpu-cycles.get_futex_key.futex_wake.do_futex.sys_futex.system_call_fastpath
      3149 ± 11%     -15.0%       2677 ± 10%  sched_debug.cpu#71.curr->pid
      4.52 ±  2%     -11.5%       4.00 ±  1%  perf-profile.cpu-cycles.do_futex.sys_futex.system_call_fastpath.syscall
   1401.34 ±  0%     -10.1%    1260.10 ±  0%  time.user_time
      4.92 ±  0%     -11.5%       4.36 ±  1%  perf-profile.cpu-cycles.sys_futex.system_call_fastpath.syscall
     12.56 ±  0%     -12.2%      11.03 ±  1%  perf-profile.cpu-cycles.system_call_after_swapgs.syscall
     11.31 ±  0%     -11.2%      10.05 ±  0%  perf-profile.cpu-cycles.system_call.syscall
      3.76 ±  0%     -13.5%       3.25 ±  0%  perf-profile.cpu-cycles.sysret_check.syscall
      1.21 ±  1%     -11.4%       1.08 ±  1%  perf-profile.cpu-cycles.drop_futex_key_refs.isra.12.futex_wake.do_futex.sys_futex.system_call_fastpath
     20.50 ±  0%     -10.9%      18.25 ±  0%  perf-profile.cpu-cycles.syscall
      3007 ±  6%     -11.8%       2654 ± 13%  sched_debug.cpu#38.curr->pid

testbox/testcase/testparams: nhm4/will-it-scale/performance-futex3

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
          2:5          -40%            :5     kmsg.Spurious_LAPIC_timer_interrupt_on_cpu
  11324367 ±  0%     -12.0%    9969101 ±  0%  will-it-scale.per_thread_ops
  11271283 ±  0%     -12.1%    9911001 ±  0%  will-it-scale.per_process_ops
      0.66 ±  0%      +9.6%       0.72 ±  0%  will-it-scale.scalability
      1.67 ±  1%    +509.7%      10.17 ±  3%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wake.do_futex.sys_futex.system_call_fastpath
      1127 ± 31%     -42.3%        651 ±  8%  sched_debug.cfs_rq[3]:/.blocked_load_avg
       799 ± 19%     -34.8%        521 ± 26%  sched_debug.cfs_rq[2]:/.blocked_load_avg
       906 ± 17%     -30.8%        627 ± 21%  sched_debug.cfs_rq[2]:/.tg_load_contrib
      1245 ± 28%     -38.8%        762 ±  8%  sched_debug.cfs_rq[3]:/.tg_load_contrib
      0.47 ±  7%    +119.7%       1.02 ±  5%  perf-profile.cpu-cycles.ret_from_sys_call.syscall
     16.48 ±  1%     +56.0%      25.71 ±  2%  perf-profile.cpu-cycles.futex_wake.do_futex.sys_futex.system_call_fastpath.syscall
       153 ± 13%     -35.3%         99 ± 22%  sched_debug.cpu#6.load
       969 ± 22%     +48.4%       1438 ± 20%  sched_debug.cfs_rq[4]:/.tg_load_contrib
       133 ± 11%     -28.7%         95 ± 22%  sched_debug.cfs_rq[6]:/.load
       827 ± 26%     +55.5%       1286 ± 23%  sched_debug.cfs_rq[4]:/.blocked_load_avg
      6.55 ±  3%     +22.3%       8.01 ±  1%  perf-profile.cpu-cycles.get_futex_key.futex_wake.do_futex.sys_futex.system_call_fastpath
        70 ± 10%     +21.7%         85 ±  9%  sched_debug.cpu#2.cpu_load[3]
        68 ± 12%     +23.5%         84 ±  9%  sched_debug.cpu#2.cpu_load[4]
       114 ±  4%     -21.6%         89 ± 12%  sched_debug.cpu#6.cpu_load[1]
       103 ±  5%     -21.8%         81 ±  9%  sched_debug.cpu#6.cpu_load[2]
    214345 ± 13%     +31.5%     281773 ± 13%  sched_debug.cfs_rq[2]:/.min_vruntime
     37071 ± 15%     +36.8%      50702 ± 15%  sched_debug.cfs_rq[2]:/.exec_clock
      1.22 ±  4%     -15.5%       1.03 ±  9%  perf-profile.cpu-cycles.drop_futex_key_refs.isra.12.do_futex.sys_futex.system_call_fastpath.syscall
     42.44 ±  0%     +13.8%      48.30 ±  2%  perf-profile.cpu-cycles.do_futex.sys_futex.system_call_fastpath.syscall
     50.65 ±  0%     +11.9%      56.69 ±  2%  perf-profile.cpu-cycles.system_call_fastpath.syscall
    199.04 ±  0%     -21.5%     156.27 ±  2%  time.user_time
     48.57 ±  0%     +11.3%      54.04 ±  2%  perf-profile.cpu-cycles.sys_futex.system_call_fastpath.syscall
     38234 ±  5%     -17.6%      31518 ±  1%  softirqs.RCU
     82318 ± 11%     -20.7%      65237 ± 13%  sched_debug.cpu#1.nr_load_updates
        95 ±  7%     -19.9%         76 ±  8%  sched_debug.cpu#6.cpu_load[4]
     12.54 ±  1%     -14.4%      10.73 ±  3%  perf-profile.cpu-cycles.system_call_after_swapgs.syscall
        98 ±  7%     -20.5%         78 ±  8%  sched_debug.cpu#6.cpu_load[3]
     11.27 ±  1%     -16.4%       9.43 ±  1%  perf-profile.cpu-cycles.system_call.syscall
    324067 ±  8%     -18.2%     265084 ± 14%  sched_debug.cfs_rq[6]:/.min_vruntime
     58381 ± 10%     -23.2%      44812 ± 18%  sched_debug.cfs_rq[6]:/.exec_clock
      3.23 ±  2%     -15.0%       2.75 ±  2%  perf-profile.cpu-cycles.sysret_check.syscall
     18.21 ±  1%     -13.6%      15.74 ±  2%  perf-profile.cpu-cycles.hash_futex.do_futex.sys_futex.system_call_fastpath.syscall
      3053 ±  8%     -18.2%       2499 ± 13%  sched_debug.cpu#6.curr->pid
     21140 ±  9%     +17.5%      24849 ±  8%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
     27368 ±  7%     -13.6%      23638 ±  8%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
       596 ±  7%     -13.6%        515 ±  8%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
     39113 ±  2%     +10.7%      43292 ±  4%  cpuidle.C6-NHM.usage

testbox/testcase/testparams: ivb42/will-it-scale/performance-futex3

0429fbc0bdc297d6  76835b0ebf8a7fe85beb03c751  
----------------  --------------------------  
  11973558 ±  0%     -16.3%   10023607 ±  0%  will-it-scale.per_thread_ops
  11960916 ±  0%     -16.5%    9989502 ±  0%  will-it-scale.per_process_ops
      0.61 ±  0%      +9.2%       0.66 ±  0%  will-it-scale.scalability
      0.83 ±  4%   +1394.0%      12.45 ±  0%  perf-profile.cpu-cycles.get_futex_key_refs.isra.11.futex_wake.do_futex.sys_futex.system_call_fastpath
       271 ± 46%    +125.9%        613 ± 17%  sched_debug.cpu#38.ttwu_local
       327 ± 42%    +155.8%        838 ± 41%  sched_debug.cpu#45.ttwu_local
       624 ± 25%    +137.2%       1482 ± 30%  sched_debug.cpu#27.sched_count
       492 ± 20%    +116.3%       1065 ± 12%  sched_debug.cpu#38.sched_goidle
       280 ± 14%    +193.7%        822 ± 43%  sched_debug.cpu#27.ttwu_count
      1317 ± 23%     +99.7%       2630 ± 12%  sched_debug.cpu#38.nr_switches
       146 ± 38%     +34.1%        196 ± 36%  sched_debug.cfs_rq[39]:/.blocked_load_avg
       163 ± 37%     +28.0%        209 ± 34%  sched_debug.cfs_rq[39]:/.tg_load_contrib
       613 ± 26%    +139.4%       1469 ± 31%  sched_debug.cpu#27.nr_switches
      1560 ± 31%     +37.3%       2142 ± 29%  sched_debug.cpu#37.nr_switches
       277 ± 27%     +82.0%        504 ± 18%  sched_debug.cpu#27.sched_goidle
     15.50 ±  0%     +96.1%      30.40 ±  0%  perf-profile.cpu-cycles.futex_wake.do_futex.sys_futex.system_call_fastpath.syscall
      1625 ± 38%     -56.3%        710 ± 12%  sched_debug.cpu#40.ttwu_local
      1528 ± 21%     +91.9%       2933 ± 35%  sched_debug.cpu#45.nr_switches
      1539 ± 21%     +91.4%       2945 ± 35%  sched_debug.cpu#45.sched_count
      5876 ± 18%     -35.8%       3771 ± 32%  sched_debug.cpu#40.sched_count
       447 ± 35%     +59.3%        712 ± 22%  sched_debug.cpu#44.sched_goidle
        16 ± 35%     -35.9%         10 ±  4%  sched_debug.cpu#41.cpu_load[0]
       948 ± 21%     +49.5%       1418 ± 21%  sched_debug.cpu#45.ttwu_count
      5504 ±  5%     -36.8%       3479 ± 14%  sched_debug.cpu#10.sched_count
         7 ± 34%     +94.3%         14 ± 43%  sched_debug.cpu#19.cpu_load[1]
         7 ± 34%     +56.8%         11 ± 17%  sched_debug.cpu#19.cpu_load[3]
         7 ± 34%     +70.5%         12 ± 30%  sched_debug.cpu#19.cpu_load[2]
        15 ± 30%     -33.2%         10 ±  4%  sched_debug.cpu#41.cpu_load[1]
      7.20 ±  0%     +52.5%      10.98 ±  1%  perf-profile.cpu-cycles.get_futex_key.futex_wake.do_futex.sys_futex.system_call_fastpath
      3947 ± 12%     -44.3%       2199 ± 44%  sched_debug.cpu#14.ttwu_count
      2594 ± 21%     +38.4%       3591 ± 13%  sched_debug.cpu#19.curr->pid
      1120 ± 31%     +55.1%       1738 ± 27%  sched_debug.cpu#43.ttwu_count
       270 ± 13%     +14.0%        307 ±  8%  numa-vmstat.node1.nr_mlock
       270 ± 13%     +14.0%        307 ±  8%  numa-vmstat.node1.nr_unevictable
      1082 ± 13%     +13.8%       1232 ±  8%  numa-meminfo.node1.Mlocked
      1082 ± 13%     +13.8%       1232 ±  8%  numa-meminfo.node1.Unevictable
   1461464 ± 17%     -16.1%    1226347 ± 10%  sched_debug.cfs_rq[43]:/.min_vruntime
      3199 ±  1%     +12.5%       3600 ± 11%  sched_debug.cpu#44.curr->pid
      1.27 ±  1%     -19.5%       1.02 ±  1%  perf-profile.cpu-cycles.drop_futex_key_refs.isra.12.do_futex.sys_futex.system_call_fastpath.syscall
     37.32 ±  0%     +30.8%      48.80 ±  0%  perf-profile.cpu-cycles.do_futex.sys_futex.system_call_fastpath.syscall
     44.13 ±  0%     +23.4%      54.44 ±  0%  perf-profile.cpu-cycles.system_call_fastpath.syscall
    995.56 ±  0%     -15.9%     836.90 ±  0%  time.user_time
        14 ± 28%     -28.5%         10 ±  4%  sched_debug.cpu#41.cpu_load[2]
        13 ± 14%     -23.1%         10 ±  4%  sched_debug.cfs_rq[41]:/.runnable_load_avg
        13 ± 16%     -21.2%         10 ±  4%  sched_debug.cfs_rq[41]:/.load
      5.09 ±  2%     -18.1%       4.17 ±  0%  perf-profile.cpu-cycles.testcase
     42.39 ±  0%     +25.1%      53.01 ±  0%  perf-profile.cpu-cycles.sys_futex.system_call_fastpath.syscall
     20179 ± 46%     +40.0%      28253 ±  5%  sched_debug.cfs_rq[19]:/.exec_clock
     12.68 ±  0%     -20.3%      10.11 ±  0%  perf-profile.cpu-cycles.system_call_after_swapgs.syscall
     13.54 ±  0%     -18.0%      11.10 ±  1%  perf-profile.cpu-cycles.system_call.syscall
      2.55 ±  0%     -18.3%       2.08 ±  0%  perf-profile.cpu-cycles.sysret_check.syscall
      4928 ± 21%     -32.7%       3319 ± 19%  sched_debug.cpu#40.nr_switches
      1547 ±  4%     +12.9%       1746 ±  5%  slabinfo.sock_inode_cache.num_objs
      1547 ±  4%     +12.9%       1746 ±  5%  slabinfo.sock_inode_cache.active_objs
     16.28 ±  0%     -15.2%      13.80 ±  1%  perf-profile.cpu-cycles.hash_futex.do_futex.sys_futex.system_call_fastpath.syscall
     90590 ±  1%     -12.1%      79668 ±  5%  meminfo.DirectMap4k
      4143 ±  5%     -11.0%       3689 ±  8%  sched_debug.cpu#37.curr->pid
       956 ±  6%      +8.3%       1036 ±  4%  slabinfo.RAW.active_objs
       956 ±  6%      +8.3%       1036 ±  4%  slabinfo.RAW.num_objs

lkp-wsx01: Westmere-EX
Memory: 128G

wsm: Westmere
Memory: 6G

lkp-snb01: Sandy Bridge-EP
Memory: 32G

lkp-sbx04: Sandy Bridge-EX
Memory: 64G

nhm4: Nehalem
Memory: 4G

lkp-g5: Westmere-EX
Memory: 2048G

ivb42: Ivytown Ivy Bridge-EP
Memory: 64G

To reproduce:

	apt-get install ruby
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
	cd lkp-tests
	bin/setup-local job.yaml # the job file attached in this email
	bin/run-local   job.yaml


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Ying Huang


View attachment "job.yaml" of type "text/plain" (1528 bytes)

View attachment "reproduce" of type "text/plain" (910 bytes)

_______________________________________________
LKP mailing list
LKP@...ux.intel.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ