[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20171204015231.GJ25368@yexl-desktop>
Date: Mon, 4 Dec 2017 09:52:31 +0800
From: kernel test robot <xiaolong.ye@...el.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Stephen Rothwell <sfr@...b.auug.org.au>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Michal Hocko <mhocko@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [lkp-robot] [mm] 6f12c591f0: vm-scalability.throughput -3.8%
regression
Greeting,
FYI, we noticed a -3.8% regression of vm-scalability.throughput due to commit:
commit: 6f12c591f0ac4a63bb7f451c0bbb5f4c81a81147 ("mm: memcontrol: fix excessive complexity in memory.stat reporting")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: vm-scalability
on test machine: 128 threads 4 Sockets Haswell-EP with 512G memory
with following parameters:
runtime: 300s
size: 1T
test: lru-shm
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
In addition to that, the commit also has significant impact on the following tests:
+------------------+----------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -4.6% regression |
| test machine | 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=50% |
| | test=page_fault3 |
+------------------+----------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/300s/1T/lkp-hsw-4ep1/lru-shm/vm-scalability
commit:
0b6db939a1 ("mm: memcontrol: implement lruvec stat functions on top of each other")
6f12c591f0 ("mm: memcontrol: fix excessive complexity in memory.stat reporting")
0b6db939a1a57c03 6f12c591f0ac4a63bb7f451c0b
---------------- --------------------------
%stddev %change %stddev
\ | \
1.49 +33.8% 2.00 vm-scalability.free_time
37303 -3.6% 35960 vm-scalability.median
4844237 -3.8% 4659025 vm-scalability.throughput
125155 ± 3% +8.5% 135756 ± 3% vm-scalability.time.involuntary_context_switches
6782 +3.2% 7000 vm-scalability.time.percent_of_cpu_this_job_got
2643 +36.3% 3602 ± 4% vm-scalability.time.user_time
1.587e+08 ± 9% -22.3% 1.234e+08 ± 5% cpuidle.POLL.time
6.09 +2.0 8.10 ± 3% mpstat.cpu.usr%
858297 +21.7% 1044527 softirqs.RCU
675.00 +2.4% 691.00 turbostat.Avg_MHz
4688 ± 2% +35.7% 6362 ± 7% vmstat.system.cs
1.787e+09 +3.1% 1.842e+09 perf-stat.cache-misses
1619987 ± 2% +37.6% 2229517 ± 7% perf-stat.context-switches
6.04 +4.7% 6.33 perf-stat.cpi
2.983e+13 +3.9% 3.1e+13 perf-stat.cpu-cycles
47218 +6.3% 50207 perf-stat.cpu-migrations
74.29 -1.7 72.63 perf-stat.iTLB-load-miss-rate%
2.543e+08 -8.1% 2.337e+08 perf-stat.iTLB-load-misses
19419 +8.0% 20974 perf-stat.instructions-per-iTLB-miss
0.17 -4.5% 0.16 perf-stat.ipc
4.962e+08 ± 2% +9.7% 5.441e+08 ± 3% perf-stat.node-load-misses
7.65 ± 2% +3.4 11.00 ± 2% perf-stat.node-store-miss-rate%
75787197 ± 2% +47.5% 1.118e+08 ± 2% perf-stat.node-store-misses
9.154e+08 -1.3% 9.039e+08 perf-stat.node-stores
75077 +20.9% 90786 ± 8% sched_debug.cfs_rq:/.exec_clock.avg
84968 +23.0% 104516 ± 10% sched_debug.cfs_rq:/.exec_clock.max
65257 +24.3% 81120 ± 7% sched_debug.cfs_rq:/.exec_clock.min
10207581 +21.4% 12395197 ± 8% sched_debug.cfs_rq:/.min_vruntime.avg
10783232 +21.3% 13075917 ± 8% sched_debug.cfs_rq:/.min_vruntime.max
7641468 ± 4% +21.1% 9254043 ± 8% sched_debug.cfs_rq:/.min_vruntime.min
8.15 ± 2% +22.2% 9.95 ± 8% sched_debug.cfs_rq:/.nr_spread_over.avg
5.86 ± 6% +16.4% 6.82 ± 4% sched_debug.cfs_rq:/.nr_spread_over.stddev
-1214677 +58.7% -1927998 sched_debug.cfs_rq:/.spread0.min
92184 +18.6% 109345 ± 8% sched_debug.cpu.nr_load_updates.avg
115923 ± 2% +17.7% 136412 ± 6% sched_debug.cpu.nr_load_updates.max
5680 ± 2% +50.7% 8559 ± 7% sched_debug.cpu.nr_switches.avg
1063 ± 4% +12.4% 1195 ± 4% sched_debug.cpu.nr_switches.min
12017 ± 12% +49.1% 17913 ± 6% sched_debug.cpu.nr_switches.stddev
4822 ± 3% +60.1% 7719 ± 8% sched_debug.cpu.sched_count.avg
624.35 ± 9% +29.9% 811.18 ± 11% sched_debug.cpu.sched_count.min
11704 ± 13% +50.8% 17646 ± 7% sched_debug.cpu.sched_count.stddev
787.60 ± 5% +18.5% 933.11 ± 13% sched_debug.cpu.sched_goidle.stddev
2222 ± 3% +63.7% 3637 ± 8% sched_debug.cpu.ttwu_count.avg
197.45 ± 2% +14.1% 225.33 ± 5% sched_debug.cpu.ttwu_count.min
5988 ± 12% +47.7% 8846 ± 7% sched_debug.cpu.ttwu_count.stddev
1857 ± 4% +71.5% 3185 ± 8% sched_debug.cpu.ttwu_local.avg
134.85 +17.1% 157.92 ± 7% sched_debug.cpu.ttwu_local.min
5806 ± 14% +50.0% 8708 ± 8% sched_debug.cpu.ttwu_local.stddev
44.45 -2.6 41.86 perf-profile.calltrace.cycles-pp.clear_page_erms.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault
83.65 -2.5 81.12 perf-profile.calltrace.cycles-pp.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
83.64 -2.5 81.10 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault
82.93 -2.5 80.47 perf-profile.calltrace.cycles-pp.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault.handle_mm_fault
89.34 -0.7 88.62 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
92.33 -0.2 92.16 perf-profile.calltrace.cycles-pp.page_fault
89.62 -0.1 89.48 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
92.05 -0.1 91.93 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
92.05 -0.1 91.93 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
6.68 -0.0 6.68 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.shmem_alloc_page.shmem_alloc_and_acct_page
7.16 +0.2 7.32 perf-profile.calltrace.cycles-pp.shmem_alloc_page.shmem_alloc_and_acct_page.shmem_getpage_gfp.shmem_fault.__do_fault
7.03 +0.2 7.20 perf-profile.calltrace.cycles-pp.alloc_pages_vma.shmem_alloc_page.shmem_alloc_and_acct_page.shmem_getpage_gfp.shmem_fault
6.86 +0.2 7.03 perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.alloc_pages_vma.shmem_alloc_page.shmem_alloc_and_acct_page.shmem_getpage_gfp
9.64 ± 2% +0.2 9.84 ± 2% perf-profile.calltrace.cycles-pp.shmem_alloc_and_acct_page.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault
47.05 -3.1 43.95 perf-profile.children.cycles-pp.clear_page_erms
83.65 -2.5 81.11 perf-profile.children.cycles-pp.shmem_fault
83.66 -2.5 81.13 perf-profile.children.cycles-pp.__do_fault
83.18 -2.5 80.67 perf-profile.children.cycles-pp.shmem_getpage_gfp
89.42 -0.7 88.70 perf-profile.children.cycles-pp.__handle_mm_fault
92.38 -0.2 92.20 perf-profile.children.cycles-pp.page_fault
89.68 -0.1 89.56 perf-profile.children.cycles-pp.handle_mm_fault
92.13 -0.1 92.01 perf-profile.children.cycles-pp.__do_page_fault
92.12 -0.1 92.00 perf-profile.children.cycles-pp.do_page_fault
7.12 +0.0 7.15 perf-profile.children.cycles-pp.get_page_from_freelist
7.18 +0.1 7.33 perf-profile.children.cycles-pp.shmem_alloc_page
7.07 +0.2 7.24 perf-profile.children.cycles-pp.alloc_pages_vma
7.08 +0.2 7.26 perf-profile.children.cycles-pp.__alloc_pages_nodemask
9.66 ± 2% +0.2 9.86 ± 2% perf-profile.children.cycles-pp.shmem_alloc_and_acct_page
45.60 -2.6 42.96 perf-profile.self.cycles-pp.clear_page_erms
23.38 -1.4 22.01 ± 2% perf-profile.self.cycles-pp.shmem_getpage_gfp
vm-scalability.throughput
4.9e+06 +-+---------------------------------------------------------------+
|.++. .++ +. :+ .+ + : +.++.+.+. +.+.++.+. +.+. +.+.+. +.+. +.|
4.8e+06 +-+ + + + + + :+ +: + + + + + |
| + + + |
4.7e+06 +-+ O O |
| O OO O OO |
4.6e+06 +-+ O O O O |
| |
4.5e+06 +-+ |
O O O O |
4.4e+06 +-O O O |
| O |
4.3e+06 +-+ O |
| |
4.2e+06 +-+---------------------------------------------------------------+
vm-scalability.free_time
2.4 +-+-------------------------------------------------------------------+
2.3 +-+ O |
| O |
2.2 +-O O |
2.1 +-+ O O O |
2 O-+ O O O O O OO O O |
1.9 +-+ O O O O |
| |
1.8 +-+ |
1.7 +-+ |
1.6 +-+ .+. |
1.5 +-+.+ .+. + .+. .+. .+ .+. .+ .+ +.+ |
|.+ +.+ +.++.+.+. + +.+. .+.++.+ ++ +.+ +.+ + + +.+.|
1.4 +-+ + + |
1.3 +-+-------------------------------------------------------------------+
perf-stat.node-store-misses
1.2e+08 +-+--------------------------------------------------------------+
1.15e+08 +-+ O |
O O O O OO O O O |
1.1e+08 +-OO OO O OO OO O O |
1.05e+08 +-+ |
1e+08 +-+ |
9.5e+07 +-+ |
| |
9e+07 +-+ |
8.5e+07 +-+ |
8e+07 +-+ +. +.+. .+ .+.++. .+ +. .+ +. +.++. + |
7.5e+07 +-+:.+.+ +.+ ++.+ + + :.+.+ + :.+.+ +. : +. :+|
| + + + + : + |
7e+07 +-+ + |
6.5e+07 +-+--------------------------------------------------------------+
perf-stat.node-store-miss-rate_
11.5 +-+-------------------------O---O------------------------------------+
11 O-+ O O O O O O O O O O |
| O O O O O O O |
10.5 +-O |
10 +-+ |
| |
9.5 +-+ |
9 +-+ |
8.5 +-+ |
| |
8 +-+ .+ + +. .+. +.+.+. .+.|
7.5 +-+. .+.+. +.+ + : + .+ +.+.+ .+.++.+. .++ +. : ++ |
| ++ + ++. : + +.+ +.+ + |
7 +-+ + |
6.5 +-+------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
***************************************************************************************************
lkp-bdw-ep3d: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/process/50%/debian-x86_64-2016-08-31.cgz/lkp-bdw-ep3d/page_fault3/will-it-scale
commit:
0b6db939a1 ("mm: memcontrol: implement lruvec stat functions on top of each other")
6f12c591f0 ("mm: memcontrol: fix excessive complexity in memory.stat reporting")
0b6db939a1a57c03 6f12c591f0ac4a63bb7f451c0b
---------------- --------------------------
%stddev %change %stddev
\ | \
1212116 -4.6% 1156875 will-it-scale.per_process_ops
233916 ± 2% -12.9% 203708 ± 13% meminfo.DirectMap4k
28669 ± 4% +7.4% 30788 ± 3% turbostat.C1E
106544 +21.9% 129864 ± 9% turbostat.C3
32957756 ± 5% +59.2% 52463265 ± 25% cpuidle.C3.time
106823 +21.8% 130151 ± 9% cpuidle.C3.usage
2.258e+08 ± 5% +26.8% 2.863e+08 ± 16% cpuidle.POLL.time
0.02 +0.0 0.06 ± 4% perf-stat.branch-miss-rate%
1.607e+09 +142.4% 3.896e+09 ± 3% perf-stat.branch-misses
26.96 -4.2 22.77 perf-stat.cache-miss-rate%
9.147e+10 +21.2% 1.108e+11 perf-stat.cache-references
1.09 +3.1% 1.13 perf-stat.cpi
9.904e+12 -4.0% 9.512e+12 perf-stat.dTLB-loads
3.669e+11 -4.6% 3.502e+11 perf-stat.dTLB-store-misses
5.412e+12 -3.8% 5.204e+12 perf-stat.dTLB-stores
52.18 -1.8 50.36 ± 2% perf-stat.iTLB-load-miss-rate%
0.91 -3.0% 0.89 perf-stat.ipc
1.604e+10 -4.6% 1.531e+10 perf-stat.minor-faults
8.27 ± 2% -0.8 7.43 ± 2% perf-stat.node-load-miss-rate%
1.013e+08 +18.1% 1.196e+08 ± 2% perf-stat.node-load-misses
1.124e+09 ± 2% +32.6% 1.49e+09 perf-stat.node-loads
1.00 +4.1 5.11 perf-stat.node-store-miss-rate%
1.631e+08 +398.2% 8.125e+08 ± 3% perf-stat.node-store-misses
1.613e+10 -6.4% 1.51e+10 ± 3% perf-stat.node-stores
1.604e+10 -4.6% 1.531e+10 perf-stat.page-faults
14.34 -0.8 13.56 perf-profile.calltrace.cycles-pp.native_irq_return_iret
36.78 -0.5 36.24 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
38.03 -0.4 37.66 perf-profile.calltrace.cycles-pp.secondary_startup_64
37.18 -0.4 36.82 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
37.18 -0.4 36.82 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
37.18 -0.4 36.82 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
37.18 -0.4 36.82 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
9.87 -0.2 9.66 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault
10.01 -0.2 9.81 perf-profile.calltrace.cycles-pp.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
9.36 -0.2 9.18 perf-profile.calltrace.cycles-pp.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault.handle_mm_fault
8.66 -0.1 8.52 perf-profile.calltrace.cycles-pp.find_lock_entry.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault
5.82 +0.0 5.82 perf-profile.calltrace.cycles-pp.find_get_entry.find_lock_entry.shmem_getpage_gfp.shmem_fault.__do_fault
6.94 +0.7 7.62 perf-profile.calltrace.cycles-pp.unmap_region.do_munmap.vm_munmap.sys_munmap.entry_SYSCALL_64_fastpath
6.96 +0.7 7.64 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath
6.94 +0.7 7.62 perf-profile.calltrace.cycles-pp.sys_munmap.entry_SYSCALL_64_fastpath
6.94 +0.7 7.62 perf-profile.calltrace.cycles-pp.vm_munmap.sys_munmap.entry_SYSCALL_64_fastpath
6.94 +0.7 7.62 perf-profile.calltrace.cycles-pp.do_munmap.vm_munmap.sys_munmap.entry_SYSCALL_64_fastpath
6.92 +0.7 7.60 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_munmap.vm_munmap
6.92 +0.7 7.60 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_munmap.vm_munmap.sys_munmap
34.31 +0.7 35.04 perf-profile.calltrace.cycles-pp.page_fault
32.71 +0.8 33.55 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
32.61 +0.8 33.46 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
22.08 +1.1 23.14 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
23.74 +1.5 25.26 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
4.63 +2.3 6.89 perf-profile.calltrace.cycles-pp.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
4.37 +2.3 6.66 perf-profile.calltrace.cycles-pp.alloc_set_pte.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault
14.34 -0.8 13.56 perf-profile.children.cycles-pp.native_irq_return_iret
37.63 -0.5 37.08 perf-profile.children.cycles-pp.intel_idle
38.03 -0.4 37.66 perf-profile.children.cycles-pp.secondary_startup_64
38.03 -0.4 37.66 perf-profile.children.cycles-pp.cpu_startup_entry
38.03 -0.4 37.66 perf-profile.children.cycles-pp.do_idle
38.03 -0.4 37.66 perf-profile.children.cycles-pp.cpuidle_enter_state
37.18 -0.4 36.82 perf-profile.children.cycles-pp.start_secondary
10.09 -0.2 9.89 perf-profile.children.cycles-pp.shmem_fault
10.15 -0.2 9.95 perf-profile.children.cycles-pp.__do_fault
9.58 -0.2 9.39 perf-profile.children.cycles-pp.shmem_getpage_gfp
8.89 -0.2 8.74 perf-profile.children.cycles-pp.find_lock_entry
5.95 -0.0 5.95 perf-profile.children.cycles-pp.find_get_entry
7.06 +0.6 7.70 perf-profile.children.cycles-pp.unmap_page_range
6.98 +0.7 7.66 perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath
6.95 +0.7 7.62 perf-profile.children.cycles-pp.sys_munmap
6.95 +0.7 7.62 perf-profile.children.cycles-pp.vm_munmap
6.95 +0.7 7.62 perf-profile.children.cycles-pp.do_munmap
6.93 +0.7 7.60 perf-profile.children.cycles-pp.unmap_vmas
6.95 +0.7 7.62 perf-profile.children.cycles-pp.unmap_region
34.31 +0.7 35.05 perf-profile.children.cycles-pp.page_fault
32.99 +0.8 33.81 perf-profile.children.cycles-pp.do_page_fault
33.15 +0.8 33.98 perf-profile.children.cycles-pp.__do_page_fault
22.28 +1.1 23.36 perf-profile.children.cycles-pp.__handle_mm_fault
23.97 +1.5 25.48 perf-profile.children.cycles-pp.handle_mm_fault
4.69 +2.3 6.97 perf-profile.children.cycles-pp.finish_fault
4.57 +2.4 6.93 perf-profile.children.cycles-pp.alloc_set_pte
14.34 -0.8 13.56 perf-profile.self.cycles-pp.native_irq_return_iret
37.63 -0.5 37.08 perf-profile.self.cycles-pp.intel_idle
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong
View attachment "config-4.15.0-rc1-00089-g6f12c59" of type "text/plain" (164439 bytes)
View attachment "job-script" of type "text/plain" (6867 bytes)
View attachment "job.yaml" of type "text/plain" (4301 bytes)
View attachment "reproduce" of type "text/plain" (61932 bytes)
Powered by blists - more mailing lists