linux-kernel - [lkp] [mm, page_alloc] 68df942cc6: pbzip2.throughput -6.3% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20160728063617.GA21499@yexl-desktop>
Date:	Thu, 28 Jul 2016 14:36:17 +0800
From:	kernel test robot <xiaolong.ye@...el.com>
To:	Mel Gorman <mgorman@...hsingularity.net>
Cc:	Stephen Rothwell <sfr@...b.auug.org.au>,
	Vlastimil Babka <vbabka@...e.cz>,
	Hillf Danton <hillf.zj@...baba-inc.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Michal Hocko <mhocko@...nel.org>,
	Minchan Kim <minchan@...nel.org>,
	Rik van Riel <riel@...riel.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [lkp] [mm, page_alloc]  68df942cc6: pbzip2.throughput -6.3%
 regression


FYI, we noticed a -6.3% regression of pbzip2.throughput due to commit:

commit 68df942cc66426c0922a91768bed1e05a0e937ab ("mm, page_alloc: remove fair zone allocation policy")
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master

in testcase: pbzip2
on test machine: 48 threads 2 sockets Xeon E5 (Ivytown Ivy Bridge-EP) with 64G memory
with following parameters:

	nr_threads: 100%
	blocksize: 900K
	cpufreq_governor: performance

In addition to that, the commit also has significant impact on the following tests:

+------------------+------------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput -4.2% regression                       |
| test machine     | 48 threads 2 sockets Xeon E5 (Ivytown Ivy Bridge-EP) with 64G memory   |
| test parameters  | cpufreq_governor=performance                                           |
|                  | ipc=socket                                                             |
|                  | mode=threads                                                           |
|                  | nr_threads=1600%                                                       |
+------------------+------------------------------------------------------------------------+


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
blocksize/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/testcase:
  900K/gcc-6/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/ivb42/pbzip2

commit: 
  e67589fcc1 ("mm, vmscan: add classzone information to tracepoints")
  68df942cc6 ("mm, page_alloc: remove fair zone allocation policy")

e67589fcc119127c 68df942cc66426c0922a91768b 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  99347521 ±  0%      -6.3%   93098631 ±  0%  pbzip2.throughput
    174.24 ±100%    +134.3%     408.25 ±  1%  pbzip2.time.system_time
     18.75 ±  0%      -5.5%      17.73 ±  0%  turbostat.RAMWatt
    104514 ±  1%      -5.1%      99188 ±  0%  vmstat.system.in
     40698 ±  3%   +1087.6%     483328 ±  0%  meminfo.Active(anon)
    488081 ±  0%    -100.0%      17.75 ±  2%  meminfo.Active(file)
  26091843 ±  3%     -15.2%   22114482 ±  4%  numa-numastat.node0.local_node
  26091945 ±  3%     -15.2%   22114587 ±  4%  numa-numastat.node0.numa_hit
     10292 ±  2%    -100.0%       0.00 ± -1%  proc-vmstat.nr_alloc_batch
   5482540 ±  0%    -100.0%       0.00 ±  0%  proc-vmstat.pgalloc_dma32
    759939 ±  3%     +25.5%     954027 ± 13%  cpuidle.C1E-IVT.time
      6624 ±  7%     +27.8%       8464 ± 12%  cpuidle.C1E-IVT.usage
   1899377 ±  5%     +22.1%    2318726 ± 14%  cpuidle.C3-IVT.time
      8853 ±  4%     +23.4%      10929 ± 12%  cpuidle.C3-IVT.usage
      4931 ±  2%    -100.0%       0.00 ± -1%  numa-vmstat.node0.nr_alloc_batch
  12999778 ±  2%     -13.8%   11199799 ±  5%  numa-vmstat.node0.numa_hit
  12999732 ±  2%     -13.8%   11199745 ±  5%  numa-vmstat.node0.numa_local
      5517 ±  3%    -100.0%       0.00 ± -1%  numa-vmstat.node1.nr_alloc_batch
     25.53 ±  2%     +17.6%      30.04 ±  6%  sched_debug.cfs_rq:/.util_avg.stddev
     10.11 ±  4%     +21.5%      12.29 ±  9%  sched_debug.cpu.clock.stddev
     10.11 ±  4%     +21.5%      12.29 ±  9%  sched_debug.cpu.clock_task.stddev
    383.50 ±  5%     +23.4%     473.36 ±  5%  sched_debug.cpu.sched_goidle.avg
      1010 ±  7%     +78.1%       1799 ± 34%  sched_debug.cpu.sched_goidle.max
    191.33 ± 16%     +27.9%     244.71 ±  8%  sched_debug.cpu.sched_goidle.min
    125.22 ±  9%     +86.8%     233.95 ± 33%  sched_debug.cpu.sched_goidle.stddev
     38132 ± 11%     -21.2%      30039 ± 11%  sched_debug.cpu.ttwu_count.max
     36218 ± 13%     -21.6%      28407 ± 12%  sched_debug.cpu.ttwu_local.max
 3.062e+11 ±  0%      -6.0%  2.878e+11 ±  0%  perf-stat.L1-dcache-load-misses
 5.255e+12 ±  0%      -6.3%  4.925e+12 ±  0%  perf-stat.L1-dcache-loads
 1.992e+10 ±  0%      -7.9%  1.835e+10 ±  1%  perf-stat.L1-dcache-prefetch-misses
 1.591e+11 ±  0%      -6.0%  1.496e+11 ±  1%  perf-stat.L1-dcache-store-misses
 3.128e+12 ±  0%      -6.4%  2.929e+12 ±  0%  perf-stat.L1-dcache-stores
 2.014e+09 ±  1%      +6.3%  2.142e+09 ±  1%  perf-stat.L1-icache-load-misses
 2.943e+10 ±  0%      -4.7%  2.804e+10 ±  0%  perf-stat.LLC-load-misses
 1.164e+11 ±  0%      -5.6%  1.099e+11 ±  0%  perf-stat.LLC-loads
 9.141e+09 ±  0%      -4.5%  8.731e+09 ±  1%  perf-stat.LLC-prefetch-misses
 1.243e+10 ±  1%      -4.4%  1.188e+10 ±  1%  perf-stat.LLC-prefetches
 3.776e+10 ±  1%      -5.3%  3.577e+10 ±  0%  perf-stat.LLC-store-misses
 7.226e+10 ±  0%      -5.1%   6.86e+10 ±  0%  perf-stat.LLC-stores
  2.85e+12 ±  0%      -6.8%  2.658e+12 ±  0%  perf-stat.branch-instructions
 1.564e+11 ±  0%      -6.4%  1.465e+11 ±  0%  perf-stat.branch-load-misses
 2.825e+12 ±  0%      -6.0%  2.655e+12 ±  0%  perf-stat.branch-loads
 1.561e+11 ±  0%      -5.9%   1.47e+11 ±  0%  perf-stat.branch-misses
 6.636e+10 ±  0%      -4.3%  6.351e+10 ±  0%  perf-stat.cache-misses
 1.879e+11 ±  0%      -4.7%   1.79e+11 ±  0%  perf-stat.cache-references
     63404 ±  1%     -18.5%      51690 ±  2%  perf-stat.cpu-migrations
  5.25e+12 ±  0%      -6.2%  4.924e+12 ±  0%  perf-stat.dTLB-loads
 3.129e+12 ±  0%      -6.4%  2.928e+12 ±  0%  perf-stat.dTLB-stores
 2.054e+13 ±  0%      -6.5%   1.92e+13 ±  0%  perf-stat.instructions
  39561733 ±  5%      -6.4%   37045620 ±  2%  perf-stat.minor-faults
 2.943e+10 ±  0%      -4.2%  2.818e+10 ±  0%  perf-stat.node-loads
 3.746e+10 ±  0%      -4.6%  3.572e+10 ±  0%  perf-stat.node-stores
  39561720 ±  5%      -6.4%   37045614 ±  2%  perf-stat.page-faults
 3.504e+13 ±  0%      +1.4%  3.553e+13 ±  0%  perf-stat.stalled-cycles-frontend




                                   pbzip2.throughput

  1.2e+08 ++----------------------------------------------------------------+
          |                                                                 |
    1e+08 ++**.**.**    **.**.**.**.**  *.**.**.*     **.**.**    **.**.**.**
          O OO OO OO OO OO OO OO OO OO OO OO OO :     :      :    :         |
          | :      :    :            :  :       :     :      :    :         |
    8e+07 ++:       :   :             : :       :     :       :   :         |
          |:        :  :              : :       :    :        :  :          |
    6e+07 ++        :  :              : :       :    :        :  :          |
          |:        :  :              : :        :   :        :  :          |
    4e+07 ++        :  :              ::         :   :        :  :          |
          |:        :  :              ::         :   :        :  :          |
          |:         : :               :         :   :         : :          |
    2e+07 ++         ::                :         :  :          ::           |
          |          ::                :         :  :          ::           |
        0 *+---------**----------------*---------*-**----------**-----------+



	[*] bisect-good sample
	[O] bisect-bad  sample

***************************************************************************************************
ivb42: 48 threads 2 sockets Xeon E5 (Ivytown Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/ipc/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
  gcc-6/performance/socket/x86_64-rhel/threads/1600%/debian-x86_64-2015-02-07.cgz/ivb42/hackbench

commit: 
  e67589fcc1 ("mm, vmscan: add classzone information to tracepoints")
  68df942cc6 ("mm, page_alloc: remove fair zone allocation policy")

e67589fcc119127c 68df942cc66426c0922a91768b 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    147426 ±  0%      -4.2%     141178 ±  0%  hackbench.throughput
     18.16 ±  0%      -2.8%      17.65 ±  0%  turbostat.RAMWatt
      4306 ±  6%    -100.0%       0.00 ± -1%  numa-vmstat.node0.nr_alloc_batch
      3908 ±  6%    -100.0%       0.00 ± -1%  numa-vmstat.node1.nr_alloc_batch
      8210 ±  3%    -100.0%       0.00 ± -1%  proc-vmstat.nr_alloc_batch
    610148 ±  6%    -100.0%       0.00 ±  0%  proc-vmstat.pgalloc_dma32
    520899 ±  0%     -45.4%     284616 ±  1%  meminfo.Active
     32878 ±  6%    +765.6%     284607 ±  1%  meminfo.Active(anon)
    488019 ±  0%    -100.0%       8.25 ±  5%  meminfo.Active(file)
    285266 ±  0%     +71.1%     488045 ±  0%  meminfo.Inactive(file)
      1.94 ±142%    +701.7%      15.55 ± 46%  perf-profile.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg
      2.48 ±111%    +526.5%      15.55 ± 46%  perf-profile.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
     47.95 ± 33%     -45.6%      26.09 ± 15%  perf-profile.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read
     50.02 ± 35%     -47.8%      26.09 ± 15%  perf-profile.cycles-pp.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read.vfs_read
      3.64 ± 15%     -37.3%       2.28 ± 25%  sched_debug.cfs_rq:/.load_avg.min
  23999708 ±  2%     -11.0%   21367447 ±  2%  sched_debug.cfs_rq:/.min_vruntime.max
    879491 ±  6%     -17.0%     730204 ±  0%  sched_debug.cpu.avg_idle.avg
   2636078 ± 13%     -20.7%    2090915 ± 17%  sched_debug.cpu.avg_idle.max
      3.08 ± 18%     -43.2%       1.75 ± 27%  sched_debug.cpu.cpu_load[1].min
      3.17 ± 19%     -43.8%       1.78 ± 27%  sched_debug.cpu.cpu_load[2].min
      3.19 ± 20%     -43.3%       1.81 ± 26%  sched_debug.cpu.cpu_load[3].min
      3.25 ± 19%     -44.2%       1.81 ± 26%  sched_debug.cpu.cpu_load[4].min
   3987636 ±  1%     -10.7%    3561972 ±  3%  sched_debug.cpu.nr_switches.avg
   3311971 ±  5%     -11.2%    2940070 ±  2%  sched_debug.cpu.nr_switches.min
      0.11 ±  0%     +12.5%       0.12 ±  0%  sched_debug.rt_rq:/.rt_nr_running.max
 5.478e+11 ±  0%      -6.1%  5.142e+11 ±  1%  perf-stat.L1-dcache-load-misses
 1.175e+13 ±  0%      -6.3%    1.1e+13 ±  0%  perf-stat.L1-dcache-loads
  6.36e+10 ±  0%      -1.6%   6.26e+10 ±  0%  perf-stat.L1-dcache-prefetch-misses
 2.051e+11 ±  0%      -6.3%  1.921e+11 ±  0%  perf-stat.L1-dcache-store-misses
 8.779e+12 ±  0%      -6.2%  8.232e+12 ±  0%  perf-stat.L1-dcache-stores
 1.128e+11 ±  0%     -27.1%  8.226e+10 ±  3%  perf-stat.L1-icache-load-misses
  8.95e+10 ±  0%      -6.0%  8.412e+10 ±  0%  perf-stat.LLC-load-misses
 1.615e+11 ±  0%      -6.7%  1.506e+11 ±  0%  perf-stat.LLC-loads
 2.055e+10 ±  0%      -1.7%   2.02e+10 ±  1%  perf-stat.LLC-prefetch-misses
 3.953e+10 ±  0%      -4.2%  3.786e+10 ±  1%  perf-stat.LLC-prefetches
 4.142e+10 ±  0%      -6.4%  3.878e+10 ±  0%  perf-stat.LLC-store-misses
 5.819e+10 ±  0%      -7.1%  5.407e+10 ±  1%  perf-stat.LLC-stores
 6.894e+12 ±  0%      -6.3%  6.462e+12 ±  0%  perf-stat.branch-instructions
 1.295e+10 ±  2%      -8.0%  1.192e+10 ±  3%  perf-stat.branch-load-misses
 6.883e+12 ±  0%      -6.3%  6.449e+12 ±  0%  perf-stat.branch-loads
 1.197e+10 ±  0%      -9.6%  1.083e+10 ±  1%  perf-stat.branch-misses
 2.913e+12 ±  0%      -2.2%  2.848e+12 ±  0%  perf-stat.bus-cycles
 1.301e+11 ±  0%      -6.4%  1.218e+11 ±  0%  perf-stat.cache-misses
 2.366e+11 ±  0%      -6.6%  2.211e+11 ±  0%  perf-stat.cache-references
 4.045e+08 ±  1%      -5.4%  3.826e+08 ±  3%  perf-stat.context-switches
  8.73e+13 ±  0%      -2.2%  8.536e+13 ±  0%  perf-stat.cpu-cycles
 1.172e+13 ±  0%      -6.2%    1.1e+13 ±  0%  perf-stat.dTLB-loads
 8.782e+12 ±  0%      -6.3%  8.228e+12 ±  0%  perf-stat.dTLB-stores
 3.854e+13 ±  0%      -6.2%  3.613e+13 ±  0%  perf-stat.instructions
 6.257e+09 ±  2%      -7.5%  5.786e+09 ±  5%  perf-stat.node-load-misses
 8.953e+10 ±  0%      -6.0%  8.417e+10 ±  0%  perf-stat.node-loads
 2.075e+10 ±  0%      -2.0%  2.033e+10 ±  1%  perf-stat.node-prefetches
 4.114e+10 ±  0%      -6.6%  3.842e+10 ±  0%  perf-stat.node-stores
 7.845e+13 ±  0%      -2.3%  7.667e+13 ±  0%  perf-stat.ref-cycles


Thanks,
Xiaolong

View attachment "config-4.7.0-rc7-00249-g68df942" of type "text/plain" (151037 bytes)

View attachment "job.yaml" of type "text/plain" (3711 bytes)

View attachment "reproduce" of type "text/plain" (103 bytes)