[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20191015004254.GS9415@shao2-debian>
Date: Tue, 15 Oct 2019 08:42:54 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: virtio-dev@...ts.oasis-open.org, kvm@...r.kernel.org,
mst@...hat.com, david@...hat.com, dave.hansen@...el.com,
linux-kernel@...r.kernel.org, willy@...radead.org,
mhocko@...nel.org, linux-mm@...ck.org, akpm@...ux-foundation.org,
mgorman@...hsingularity.net, vbabka@...e.cz, osalvador@...e.de,
yang.zhang.wz@...il.com, pagupta@...hat.com,
konrad.wilk@...cle.com, nitesh@...hat.com, riel@...riel.com,
lcapitulino@...hat.com, wei.w.wang@...el.com, aarcange@...hat.com,
pbonzini@...hat.com, dan.j.williams@...el.com,
alexander.h.duyck@...ux.intel.com, lkp@...ts.01.org
Subject: [mm] 2eca680594: will-it-scale.per_process_ops -2.5% regression
Greeting,
FYI, we noticed a -2.5% regression of will-it-scale.per_process_ops due to commit:
commit: 2eca680594818153ac6a1be3ad8e964184169bf2 ("[PATCH v11 2/6] mm: Use zone and order instead of free area in free_list manipulators")
url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/mm-virtio-Provide-support-for-unused-page-reporting/20191002-024207
in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
with following parameters:
nr_task: 100%
mode: process
test: page_fault2
cpufreq_governor: performance
ucode: 0xb000038
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <rong.a.chen@...el.com>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-2019-09-23.cgz/lkp-bdw-ep6/page_fault2/will-it-scale/0xb000038
commit:
2f16feee6a ("mm: Adjust shuffle code to allow for future coalescing")
2eca680594 ("mm: Use zone and order instead of free area in free_list manipulators")
2f16feee6a912d6b 2eca680594818153ac6a1be3ad8
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
3:4 -2% 2:4 perf-profile.self.cycles-pp.error_entry
%stddev %change %stddev
\ | \
84981 -2.5% 82888 will-it-scale.per_process_ops
7478397 -2.5% 7294217 will-it-scale.workload
614224 ± 3% -8.7% 560976 ± 4% meminfo.DirectMap4k
0.00 ± 86% +0.0 0.00 ± 27% mpstat.cpu.all.soft%
8560 ± 99% -99.4% 51.25 ± 13% numa-numastat.node1.other_node
1331 ± 45% +791.3% 11867 ± 87% turbostat.C1
43.50 +3.1% 44.85 boot-time.boot
3387 +4.0% 3523 boot-time.idle
109720 ± 11% +237.3% 370072 ± 95% cpuidle.C1.time
5131 ± 10% +178.3% 14281 ± 76% cpuidle.C1.usage
13736 ± 3% -7.7% 12672 numa-vmstat.node0.nr_slab_reclaimable
10240 ± 5% +10.2% 11280 numa-vmstat.node1.nr_slab_reclaimable
54947 ± 3% -7.8% 50687 ± 2% numa-meminfo.node0.KReclaimable
54947 ± 3% -7.8% 50687 ± 2% numa-meminfo.node0.SReclaimable
40956 ± 5% +10.2% 45122 numa-meminfo.node1.KReclaimable
40956 ± 5% +10.2% 45122 numa-meminfo.node1.SReclaimable
2.256e+09 -2.4% 2.202e+09 proc-vmstat.numa_hit
2.256e+09 -2.4% 2.202e+09 proc-vmstat.numa_local
2.258e+09 -2.4% 2.203e+09 proc-vmstat.pgalloc_normal
2.249e+09 -2.4% 2.195e+09 proc-vmstat.pgfault
2.255e+09 -2.4% 2.202e+09 proc-vmstat.pgfree
148.70 ± 8% -23.7% 113.47 ± 12% sched_debug.cfs_rq:/.nr_spread_over.stddev
-62259 -195.9% 59734 ± 38% sched_debug.cfs_rq:/.spread0.avg
68724 ± 31% +174.2% 188414 ± 12% sched_debug.cfs_rq:/.spread0.max
650.62 ± 13% +21.0% 787.54 ± 6% sched_debug.cfs_rq:/.util_avg.min
77.78 ± 21% -31.8% 53.07 ± 9% sched_debug.cfs_rq:/.util_avg.stddev
40.08 ± 36% -66.7% 13.33 ±107% sched_debug.cfs_rq:/.util_est_enqueued.min
266102 ± 49% -60.6% 104930 ± 2% sched_debug.cpu.avg_idle.stddev
22597 ± 8% -27.9% 16297 ± 10% sched_debug.cpu.nr_switches.max
3715 ± 2% -19.5% 2992 ± 7% sched_debug.cpu.nr_switches.stddev
19360 ± 10% -31.3% 13306 ± 9% sched_debug.cpu.sched_count.max
3208 ± 4% -24.6% 2420 ± 9% sched_debug.cpu.sched_count.stddev
2.21 ± 84% +117.0% 4.79 ± 14% sched_debug.cpu.sched_goidle.min
9763 ± 13% -37.4% 6112 ± 13% sched_debug.cpu.ttwu_count.max
1549 ± 5% -27.3% 1126 ± 10% sched_debug.cpu.ttwu_count.stddev
9112 ± 10% -37.9% 5657 ± 14% sched_debug.cpu.ttwu_local.max
1443 ± 3% -29.6% 1015 ± 12% sched_debug.cpu.ttwu_local.stddev
199.25 ± 22% +34.6% 268.25 ± 36% interrupts.36:IR-PCI-MSI.1572867-edge.eth0-TxRx-2
199.25 ± 22% +34.6% 268.25 ± 36% interrupts.CPU15.36:IR-PCI-MSI.1572867-edge.eth0-TxRx-2
47.25 ± 59% +475.1% 271.75 ± 48% interrupts.CPU17.RES:Rescheduling_interrupts
59.75 ± 61% +148.1% 148.25 ± 31% interrupts.CPU18.RES:Rescheduling_interrupts
35.00 ± 93% +406.4% 177.25 ± 47% interrupts.CPU19.RES:Rescheduling_interrupts
2910 ± 3% +9.7% 3192 ± 8% interrupts.CPU2.CAL:Function_call_interrupts
33.50 ±115% +410.4% 171.00 ± 58% interrupts.CPU21.RES:Rescheduling_interrupts
3033 ± 4% +17.3% 3557 ± 8% interrupts.CPU22.CAL:Function_call_interrupts
2965 ± 6% +13.9% 3379 ± 5% interrupts.CPU27.CAL:Function_call_interrupts
202.75 ± 34% -50.8% 99.75 ± 49% interrupts.CPU28.RES:Rescheduling_interrupts
134.00 ± 32% +243.8% 460.75 ± 92% interrupts.CPU31.RES:Rescheduling_interrupts
90.25 ±108% +467.6% 512.25 ± 91% interrupts.CPU44.RES:Rescheduling_interrupts
454.75 ± 74% -78.4% 98.25 ± 82% interrupts.CPU49.RES:Rescheduling_interrupts
4916 ± 34% +60.4% 7885 interrupts.CPU55.NMI:Non-maskable_interrupts
4916 ± 34% +60.4% 7885 interrupts.CPU55.PMI:Performance_monitoring_interrupts
33.25 ±110% +273.7% 124.25 ± 27% interrupts.CPU61.RES:Rescheduling_interrupts
8.00 ± 81% +2500.0% 208.00 ± 97% interrupts.CPU65.RES:Rescheduling_interrupts
105.25 ±114% +368.2% 492.75 ± 64% interrupts.CPU69.RES:Rescheduling_interrupts
224.00 ± 50% -76.2% 53.25 ±121% interrupts.CPU70.RES:Rescheduling_interrupts
41976580 -4.3% 40191219 perf-stat.i.branch-misses
4.657e+08 -3.1% 4.511e+08 ± 2% perf-stat.i.cache-misses
1.446e+09 -3.3% 1.398e+09 perf-stat.i.cache-references
540.00 +203.9% 1641 ±114% perf-stat.i.cycles-between-cache-misses
72449681 -3.7% 69791647 ± 2% perf-stat.i.dTLB-store-misses
6.748e+09 -3.7% 6.499e+09 perf-stat.i.dTLB-stores
15000441 -3.3% 14499685 perf-stat.i.iTLB-load-misses
48416 ± 8% -30.9% 33446 ± 36% perf-stat.i.iTLB-loads
7390548 -3.4% 7136366 ± 2% perf-stat.i.minor-faults
1.31e+08 -4.1% 1.256e+08 perf-stat.i.node-loads
866429 -5.4% 819527 perf-stat.i.node-store-misses
32410281 -5.1% 30770659 perf-stat.i.node-stores
7390641 -3.4% 7136212 ± 2% perf-stat.i.page-faults
21.34 -2.1% 20.90 perf-stat.overall.MPKI
0.28 -0.0 0.27 perf-stat.overall.branch-miss-rate%
521.95 +2.1% 532.97 perf-stat.overall.cycles-between-cache-misses
4515 +2.1% 4612 perf-stat.overall.instructions-per-iTLB-miss
2752428 +2.3% 2816574 perf-stat.overall.path-length
41829479 -4.2% 40062021 perf-stat.ps.branch-misses
4.641e+08 -3.1% 4.498e+08 perf-stat.ps.cache-misses
1.441e+09 -3.3% 1.394e+09 perf-stat.ps.cache-references
72196133 -3.6% 69580505 ± 2% perf-stat.ps.dTLB-store-misses
6.724e+09 -3.6% 6.479e+09 perf-stat.ps.dTLB-stores
14947871 -3.3% 14455433 perf-stat.ps.iTLB-load-misses
48333 ± 7% -31.0% 33349 ± 36% perf-stat.ps.iTLB-loads
7365449 -3.4% 7115485 perf-stat.ps.minor-faults
1.305e+08 -4.0% 1.253e+08 perf-stat.ps.node-loads
863374 -5.4% 817033 perf-stat.ps.node-store-misses
32297524 -5.0% 30677371 perf-stat.ps.node-stores
7365171 -3.4% 7115053 perf-stat.ps.page-faults
8.09 -1.3 6.82 perf-profile.calltrace.cycles-pp.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
7.99 -1.3 6.73 perf-profile.calltrace.cycles-pp.alloc_set_pte.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault
5.97 -1.2 4.74 perf-profile.calltrace.cycles-pp.__lru_cache_add.alloc_set_pte.finish_fault.__handle_mm_fault.handle_mm_fault
5.87 -1.2 4.64 perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte.finish_fault.__handle_mm_fault
4.59 -1.2 3.40 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte
4.62 -1.2 3.44 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte.finish_fault
56.80 -0.7 56.09 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
55.72 -0.7 55.02 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
57.16 -0.7 56.45 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
57.26 -0.7 56.56 perf-profile.calltrace.cycles-pp.page_fault
55.27 -0.7 54.57 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
2.15 ± 2% -0.5 1.65 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.unmap_page_range
2.16 ± 2% -0.5 1.67 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.unmap_page_range.unmap_vmas
8.78 -0.2 8.59 perf-profile.calltrace.cycles-pp.copy_user_highpage.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
8.66 -0.2 8.48 perf-profile.calltrace.cycles-pp.copy_page.copy_user_highpage.__handle_mm_fault.handle_mm_fault.__do_page_fault
0.93 -0.0 0.90 perf-profile.calltrace.cycles-pp.__pagevec_lru_add_fn.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte.finish_fault
4.11 +0.1 4.17 perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap
4.15 +0.1 4.22 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
4.13 +0.1 4.20 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap
3.72 +0.1 3.85 perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region
3.66 +0.1 3.79 perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu.tlb_finish_mmu
33.34 +0.7 34.03 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
33.34 +0.7 34.02 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
32.04 +0.7 32.76 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
31.80 +0.7 32.52 perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.unmap_page_range.unmap_vmas.unmap_region
37.53 +0.8 38.28 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
37.53 +0.8 38.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.50 +0.8 38.26 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
37.50 +0.8 38.26 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.50 +0.8 38.26 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.50 +0.8 38.26 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
35.45 +0.8 36.22 perf-profile.calltrace.cycles-pp.alloc_pages_vma.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
34.73 +0.8 35.50 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.__handle_mm_fault.handle_mm_fault
35.13 +0.8 35.91 perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.alloc_pages_vma.__handle_mm_fault.handle_mm_fault.__do_page_fault
32.89 +0.8 33.70 perf-profile.calltrace.cycles-pp._raw_spin_lock.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.__handle_mm_fault
32.80 +0.8 33.61 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma
28.87 +1.2 30.10 perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.tlb_flush_mmu.unmap_page_range.unmap_vmas
28.36 +1.2 29.61 perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu.unmap_page_range
30.85 +1.4 32.26 perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu
30.77 +1.4 32.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page_list.release_pages
7.09 -1.8 5.33 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
8.09 -1.3 6.83 perf-profile.children.cycles-pp.finish_fault
8.03 -1.3 6.77 perf-profile.children.cycles-pp.alloc_set_pte
5.89 -1.2 4.65 perf-profile.children.cycles-pp.pagevec_lru_move_fn
5.98 -1.2 4.75 perf-profile.children.cycles-pp.__lru_cache_add
56.83 -0.7 56.11 perf-profile.children.cycles-pp.__do_page_fault
55.76 -0.7 55.05 perf-profile.children.cycles-pp.handle_mm_fault
57.16 -0.7 56.46 perf-profile.children.cycles-pp.do_page_fault
57.30 -0.7 56.60 perf-profile.children.cycles-pp.page_fault
55.30 -0.7 54.60 perf-profile.children.cycles-pp.__handle_mm_fault
8.78 -0.2 8.59 perf-profile.children.cycles-pp.copy_user_highpage
8.69 -0.2 8.51 perf-profile.children.cycles-pp.copy_page
0.41 -0.0 0.39 perf-profile.children.cycles-pp.__mod_lruvec_state
4.16 +0.1 4.22 perf-profile.children.cycles-pp.tlb_finish_mmu
70.71 +0.5 71.19 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
33.34 +0.7 34.03 perf-profile.children.cycles-pp.unmap_vmas
33.34 +0.7 34.03 perf-profile.children.cycles-pp.unmap_page_range
37.61 +0.7 38.36 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
37.61 +0.7 38.36 perf-profile.children.cycles-pp.do_syscall_64
37.50 +0.8 38.26 perf-profile.children.cycles-pp.__do_munmap
37.50 +0.8 38.26 perf-profile.children.cycles-pp.__vm_munmap
37.50 +0.8 38.26 perf-profile.children.cycles-pp.unmap_region
37.50 +0.8 38.26 perf-profile.children.cycles-pp.__x64_sys_munmap
35.47 +0.8 36.25 perf-profile.children.cycles-pp.alloc_pages_vma
34.87 +0.8 35.65 perf-profile.children.cycles-pp.get_page_from_freelist
36.18 +0.8 36.96 perf-profile.children.cycles-pp.tlb_flush_mmu
36.03 +0.8 36.82 perf-profile.children.cycles-pp.release_pages
35.25 +0.8 36.04 perf-profile.children.cycles-pp.__alloc_pages_nodemask
32.62 +1.4 34.00 perf-profile.children.cycles-pp.free_unref_page_list
32.04 +1.4 33.44 perf-profile.children.cycles-pp.free_pcppages_bulk
64.81 +2.2 67.01 perf-profile.children.cycles-pp._raw_spin_lock
8.65 -0.2 8.46 perf-profile.self.cycles-pp.copy_page
0.94 -0.0 0.89 perf-profile.self.cycles-pp.get_page_from_freelist
1.13 -0.0 1.10 perf-profile.self.cycles-pp._raw_spin_lock
70.71 +0.5 71.19 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
will-it-scale.per_process_ops
90000 +-+-----------------------------------------------------------------+
O..O.O..O.O..O..O.O..O..O.O..O O O..O O O.O O O.O..O..O.O..O.+..|
80000 +-+ : : : : : : |
70000 +-+ : : : : : : |
| : : : : : : |
60000 +-+ : : : : : : |
50000 +-+ : : : : : : |
| : : : : : : |
40000 +-+ : : : : : : |
30000 +-+ : : : : : : |
| : : : : : : |
20000 +-+ :: :: : : |
10000 +-+ : : : |
| : : : |
0 +-+-----------------------------------------------------------------+
will-it-scale.workload
8e+06 +-+-----------------------------------------------------------------+
O..O.O..O.O..O..O.O..O..O.O..O O O..O O O.O O O.O..O..O.O..O.+..|
7e+06 +-+ : : : : : : |
6e+06 +-+ : : : : : : |
| : : : : : : |
5e+06 +-+ : : : : : : |
| : : : : : : |
4e+06 +-+ : : : : : : |
| : : : : : : |
3e+06 +-+ : : : : : : |
2e+06 +-+ : : : : : : |
| :: :: : : |
1e+06 +-+ : : : |
| : : : |
0 +-+-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.4.0-rc1-00002-g2eca680594818" of type "text/plain" (200620 bytes)
View attachment "job-script" of type "text/plain" (7566 bytes)
View attachment "job.yaml" of type "text/plain" (5184 bytes)
View attachment "reproduce" of type "text/plain" (315 bytes)
Powered by blists - more mailing lists