[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20160728063617.GA21499@yexl-desktop>
Date: Thu, 28 Jul 2016 14:36:17 +0800
From: kernel test robot <xiaolong.ye@...el.com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Stephen Rothwell <sfr@...b.auug.org.au>,
Vlastimil Babka <vbabka@...e.cz>,
Hillf Danton <hillf.zj@...baba-inc.com>,
Johannes Weiner <hannes@...xchg.org>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Michal Hocko <mhocko@...nel.org>,
Minchan Kim <minchan@...nel.org>,
Rik van Riel <riel@...riel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [lkp] [mm, page_alloc] 68df942cc6: pbzip2.throughput -6.3%
regression
FYI, we noticed a -6.3% regression of pbzip2.throughput due to commit:
commit 68df942cc66426c0922a91768bed1e05a0e937ab ("mm, page_alloc: remove fair zone allocation policy")
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
in testcase: pbzip2
on test machine: 48 threads 2 sockets Xeon E5 (Ivytown Ivy Bridge-EP) with 64G memory
with following parameters:
nr_threads: 100%
blocksize: 900K
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+------------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput -4.2% regression |
| test machine | 48 threads 2 sockets Xeon E5 (Ivytown Ivy Bridge-EP) with 64G memory |
| test parameters | cpufreq_governor=performance |
| | ipc=socket |
| | mode=threads |
| | nr_threads=1600% |
+------------------+------------------------------------------------------------------------+
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
blocksize/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/testcase:
900K/gcc-6/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/ivb42/pbzip2
commit:
e67589fcc1 ("mm, vmscan: add classzone information to tracepoints")
68df942cc6 ("mm, page_alloc: remove fair zone allocation policy")
e67589fcc119127c 68df942cc66426c0922a91768b
---------------- --------------------------
%stddev %change %stddev
\ | \
99347521 ± 0% -6.3% 93098631 ± 0% pbzip2.throughput
174.24 ±100% +134.3% 408.25 ± 1% pbzip2.time.system_time
18.75 ± 0% -5.5% 17.73 ± 0% turbostat.RAMWatt
104514 ± 1% -5.1% 99188 ± 0% vmstat.system.in
40698 ± 3% +1087.6% 483328 ± 0% meminfo.Active(anon)
488081 ± 0% -100.0% 17.75 ± 2% meminfo.Active(file)
26091843 ± 3% -15.2% 22114482 ± 4% numa-numastat.node0.local_node
26091945 ± 3% -15.2% 22114587 ± 4% numa-numastat.node0.numa_hit
10292 ± 2% -100.0% 0.00 ± -1% proc-vmstat.nr_alloc_batch
5482540 ± 0% -100.0% 0.00 ± 0% proc-vmstat.pgalloc_dma32
759939 ± 3% +25.5% 954027 ± 13% cpuidle.C1E-IVT.time
6624 ± 7% +27.8% 8464 ± 12% cpuidle.C1E-IVT.usage
1899377 ± 5% +22.1% 2318726 ± 14% cpuidle.C3-IVT.time
8853 ± 4% +23.4% 10929 ± 12% cpuidle.C3-IVT.usage
4931 ± 2% -100.0% 0.00 ± -1% numa-vmstat.node0.nr_alloc_batch
12999778 ± 2% -13.8% 11199799 ± 5% numa-vmstat.node0.numa_hit
12999732 ± 2% -13.8% 11199745 ± 5% numa-vmstat.node0.numa_local
5517 ± 3% -100.0% 0.00 ± -1% numa-vmstat.node1.nr_alloc_batch
25.53 ± 2% +17.6% 30.04 ± 6% sched_debug.cfs_rq:/.util_avg.stddev
10.11 ± 4% +21.5% 12.29 ± 9% sched_debug.cpu.clock.stddev
10.11 ± 4% +21.5% 12.29 ± 9% sched_debug.cpu.clock_task.stddev
383.50 ± 5% +23.4% 473.36 ± 5% sched_debug.cpu.sched_goidle.avg
1010 ± 7% +78.1% 1799 ± 34% sched_debug.cpu.sched_goidle.max
191.33 ± 16% +27.9% 244.71 ± 8% sched_debug.cpu.sched_goidle.min
125.22 ± 9% +86.8% 233.95 ± 33% sched_debug.cpu.sched_goidle.stddev
38132 ± 11% -21.2% 30039 ± 11% sched_debug.cpu.ttwu_count.max
36218 ± 13% -21.6% 28407 ± 12% sched_debug.cpu.ttwu_local.max
3.062e+11 ± 0% -6.0% 2.878e+11 ± 0% perf-stat.L1-dcache-load-misses
5.255e+12 ± 0% -6.3% 4.925e+12 ± 0% perf-stat.L1-dcache-loads
1.992e+10 ± 0% -7.9% 1.835e+10 ± 1% perf-stat.L1-dcache-prefetch-misses
1.591e+11 ± 0% -6.0% 1.496e+11 ± 1% perf-stat.L1-dcache-store-misses
3.128e+12 ± 0% -6.4% 2.929e+12 ± 0% perf-stat.L1-dcache-stores
2.014e+09 ± 1% +6.3% 2.142e+09 ± 1% perf-stat.L1-icache-load-misses
2.943e+10 ± 0% -4.7% 2.804e+10 ± 0% perf-stat.LLC-load-misses
1.164e+11 ± 0% -5.6% 1.099e+11 ± 0% perf-stat.LLC-loads
9.141e+09 ± 0% -4.5% 8.731e+09 ± 1% perf-stat.LLC-prefetch-misses
1.243e+10 ± 1% -4.4% 1.188e+10 ± 1% perf-stat.LLC-prefetches
3.776e+10 ± 1% -5.3% 3.577e+10 ± 0% perf-stat.LLC-store-misses
7.226e+10 ± 0% -5.1% 6.86e+10 ± 0% perf-stat.LLC-stores
2.85e+12 ± 0% -6.8% 2.658e+12 ± 0% perf-stat.branch-instructions
1.564e+11 ± 0% -6.4% 1.465e+11 ± 0% perf-stat.branch-load-misses
2.825e+12 ± 0% -6.0% 2.655e+12 ± 0% perf-stat.branch-loads
1.561e+11 ± 0% -5.9% 1.47e+11 ± 0% perf-stat.branch-misses
6.636e+10 ± 0% -4.3% 6.351e+10 ± 0% perf-stat.cache-misses
1.879e+11 ± 0% -4.7% 1.79e+11 ± 0% perf-stat.cache-references
63404 ± 1% -18.5% 51690 ± 2% perf-stat.cpu-migrations
5.25e+12 ± 0% -6.2% 4.924e+12 ± 0% perf-stat.dTLB-loads
3.129e+12 ± 0% -6.4% 2.928e+12 ± 0% perf-stat.dTLB-stores
2.054e+13 ± 0% -6.5% 1.92e+13 ± 0% perf-stat.instructions
39561733 ± 5% -6.4% 37045620 ± 2% perf-stat.minor-faults
2.943e+10 ± 0% -4.2% 2.818e+10 ± 0% perf-stat.node-loads
3.746e+10 ± 0% -4.6% 3.572e+10 ± 0% perf-stat.node-stores
39561720 ± 5% -6.4% 37045614 ± 2% perf-stat.page-faults
3.504e+13 ± 0% +1.4% 3.553e+13 ± 0% perf-stat.stalled-cycles-frontend
pbzip2.throughput
1.2e+08 ++----------------------------------------------------------------+
| |
1e+08 ++**.**.** **.**.**.**.** *.**.**.* **.**.** **.**.**.**
O OO OO OO OO OO OO OO OO OO OO OO OO : : : : |
| : : : : : : : : : |
8e+07 ++: : : : : : : : : |
|: : : : : : : : : |
6e+07 ++ : : : : : : : : |
|: : : : : : : : : |
4e+07 ++ : : :: : : : : |
|: : : :: : : : : |
|: : : : : : : : |
2e+07 ++ :: : : : :: |
| :: : : : :: |
0 *+---------**----------------*---------*-**----------**-----------+
[*] bisect-good sample
[O] bisect-bad sample
***************************************************************************************************
ivb42: 48 threads 2 sockets Xeon E5 (Ivytown Ivy Bridge-EP) with 64G memory
=========================================================================================
compiler/cpufreq_governor/ipc/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
gcc-6/performance/socket/x86_64-rhel/threads/1600%/debian-x86_64-2015-02-07.cgz/ivb42/hackbench
commit:
e67589fcc1 ("mm, vmscan: add classzone information to tracepoints")
68df942cc6 ("mm, page_alloc: remove fair zone allocation policy")
e67589fcc119127c 68df942cc66426c0922a91768b
---------------- --------------------------
%stddev %change %stddev
\ | \
147426 ± 0% -4.2% 141178 ± 0% hackbench.throughput
18.16 ± 0% -2.8% 17.65 ± 0% turbostat.RAMWatt
4306 ± 6% -100.0% 0.00 ± -1% numa-vmstat.node0.nr_alloc_batch
3908 ± 6% -100.0% 0.00 ± -1% numa-vmstat.node1.nr_alloc_batch
8210 ± 3% -100.0% 0.00 ± -1% proc-vmstat.nr_alloc_batch
610148 ± 6% -100.0% 0.00 ± 0% proc-vmstat.pgalloc_dma32
520899 ± 0% -45.4% 284616 ± 1% meminfo.Active
32878 ± 6% +765.6% 284607 ± 1% meminfo.Active(anon)
488019 ± 0% -100.0% 8.25 ± 5% meminfo.Active(file)
285266 ± 0% +71.1% 488045 ± 0% meminfo.Inactive(file)
1.94 ±142% +701.7% 15.55 ± 46% perf-profile.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg
2.48 ±111% +526.5% 15.55 ± 46% perf-profile.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
47.95 ± 33% -45.6% 26.09 ± 15% perf-profile.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read
50.02 ± 35% -47.8% 26.09 ± 15% perf-profile.cycles-pp.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read.vfs_read
3.64 ± 15% -37.3% 2.28 ± 25% sched_debug.cfs_rq:/.load_avg.min
23999708 ± 2% -11.0% 21367447 ± 2% sched_debug.cfs_rq:/.min_vruntime.max
879491 ± 6% -17.0% 730204 ± 0% sched_debug.cpu.avg_idle.avg
2636078 ± 13% -20.7% 2090915 ± 17% sched_debug.cpu.avg_idle.max
3.08 ± 18% -43.2% 1.75 ± 27% sched_debug.cpu.cpu_load[1].min
3.17 ± 19% -43.8% 1.78 ± 27% sched_debug.cpu.cpu_load[2].min
3.19 ± 20% -43.3% 1.81 ± 26% sched_debug.cpu.cpu_load[3].min
3.25 ± 19% -44.2% 1.81 ± 26% sched_debug.cpu.cpu_load[4].min
3987636 ± 1% -10.7% 3561972 ± 3% sched_debug.cpu.nr_switches.avg
3311971 ± 5% -11.2% 2940070 ± 2% sched_debug.cpu.nr_switches.min
0.11 ± 0% +12.5% 0.12 ± 0% sched_debug.rt_rq:/.rt_nr_running.max
5.478e+11 ± 0% -6.1% 5.142e+11 ± 1% perf-stat.L1-dcache-load-misses
1.175e+13 ± 0% -6.3% 1.1e+13 ± 0% perf-stat.L1-dcache-loads
6.36e+10 ± 0% -1.6% 6.26e+10 ± 0% perf-stat.L1-dcache-prefetch-misses
2.051e+11 ± 0% -6.3% 1.921e+11 ± 0% perf-stat.L1-dcache-store-misses
8.779e+12 ± 0% -6.2% 8.232e+12 ± 0% perf-stat.L1-dcache-stores
1.128e+11 ± 0% -27.1% 8.226e+10 ± 3% perf-stat.L1-icache-load-misses
8.95e+10 ± 0% -6.0% 8.412e+10 ± 0% perf-stat.LLC-load-misses
1.615e+11 ± 0% -6.7% 1.506e+11 ± 0% perf-stat.LLC-loads
2.055e+10 ± 0% -1.7% 2.02e+10 ± 1% perf-stat.LLC-prefetch-misses
3.953e+10 ± 0% -4.2% 3.786e+10 ± 1% perf-stat.LLC-prefetches
4.142e+10 ± 0% -6.4% 3.878e+10 ± 0% perf-stat.LLC-store-misses
5.819e+10 ± 0% -7.1% 5.407e+10 ± 1% perf-stat.LLC-stores
6.894e+12 ± 0% -6.3% 6.462e+12 ± 0% perf-stat.branch-instructions
1.295e+10 ± 2% -8.0% 1.192e+10 ± 3% perf-stat.branch-load-misses
6.883e+12 ± 0% -6.3% 6.449e+12 ± 0% perf-stat.branch-loads
1.197e+10 ± 0% -9.6% 1.083e+10 ± 1% perf-stat.branch-misses
2.913e+12 ± 0% -2.2% 2.848e+12 ± 0% perf-stat.bus-cycles
1.301e+11 ± 0% -6.4% 1.218e+11 ± 0% perf-stat.cache-misses
2.366e+11 ± 0% -6.6% 2.211e+11 ± 0% perf-stat.cache-references
4.045e+08 ± 1% -5.4% 3.826e+08 ± 3% perf-stat.context-switches
8.73e+13 ± 0% -2.2% 8.536e+13 ± 0% perf-stat.cpu-cycles
1.172e+13 ± 0% -6.2% 1.1e+13 ± 0% perf-stat.dTLB-loads
8.782e+12 ± 0% -6.3% 8.228e+12 ± 0% perf-stat.dTLB-stores
3.854e+13 ± 0% -6.2% 3.613e+13 ± 0% perf-stat.instructions
6.257e+09 ± 2% -7.5% 5.786e+09 ± 5% perf-stat.node-load-misses
8.953e+10 ± 0% -6.0% 8.417e+10 ± 0% perf-stat.node-loads
2.075e+10 ± 0% -2.0% 2.033e+10 ± 1% perf-stat.node-prefetches
4.114e+10 ± 0% -6.6% 3.842e+10 ± 0% perf-stat.node-stores
7.845e+13 ± 0% -2.3% 7.667e+13 ± 0% perf-stat.ref-cycles
Thanks,
Xiaolong
View attachment "config-4.7.0-rc7-00249-g68df942" of type "text/plain" (151037 bytes)
View attachment "job.yaml" of type "text/plain" (3711 bytes)
View attachment "reproduce" of type "text/plain" (103 bytes)
Powered by blists - more mailing lists