[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190806070547.GA10123@xsang-OptiPlex-9020>
Date: Tue, 6 Aug 2019 15:05:47 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Minchan Kim <minchan@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>, Minchan Kim <minchan@...nel.org>,
Miguel de Dios <migueldedios@...gle.com>,
Wei Wang <wvw@...gle.com>, Michal Hocko <mhocko@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Mel Gorman <mgorman@...hsingularity.net>, lkp@...org
Subject: [mm] 755d6edc1a: will-it-scale.per_process_ops -4.1% regression
Greeting,
FYI, we noticed a -4.1% regression of will-it-scale.per_process_ops due to commit:
commit: 755d6edc1aee4489c90975ec093d724d5492cecd ("[PATCH] mm: release the spinlock on zap_pte_range")
url: https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-release-the-spinlock-on-zap_pte_range/20190730-010638
in testcase: will-it-scale
on test machine: 8 threads Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz with 16G memory
with following parameters:
nr_task: 100%
mode: process
test: malloc1
cpufreq_governor: performance
ucode: 0x21
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-2019-05-14.cgz/lkp-ivb-d01/malloc1/will-it-scale/0x21
commit:
v5.3-rc2
755d6edc1a ("mm: release the spinlock on zap_pte_range")
v5.3-rc2 755d6edc1aee4489c90975ec093
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
1:5 -20% :4 dmesg.RIP:__d_lookup_rcu
1:5 -20% :4 dmesg.RIP:mnt_drop_write
:5 20% 1:4 kmsg.ab33a8>]usb_hcd_irq
:5 20% 1:4 kmsg.b445f28>]usb_hcd_irq
:5 20% 1:4 kmsg.cdf63ef>]usb_hcd_irq
1:5 -20% :4 kmsg.d4af11>]usb_hcd_irq
1:5 -20% :4 kmsg.d9>]usb_hcd_irq
:5 20% 1:4 kmsg.f805d78>]usb_hcd_irq
5:5 -7% 4:4 perf-profile.calltrace.cycles-pp.error_entry
7:5 -39% 5:4 perf-profile.children.cycles-pp.error_entry
0:5 -1% 0:4 perf-profile.children.cycles-pp.error_exit
5:5 -30% 4:4 perf-profile.self.cycles-pp.error_entry
%stddev %change %stddev
\ | \
119757 -4.1% 114839 will-it-scale.per_process_ops
958059 -4.1% 918718 will-it-scale.workload
2429 ± 16% -34.5% 1591 ± 32% cpuidle.C1.usage
0.97 ± 88% -0.7 0.26 mpstat.cpu.all.idle%
78.40 +2.0% 80.00 vmstat.cpu.sy
45.42 +2.1% 46.38 turbostat.CorWatt
50.46 +2.0% 51.45 turbostat.PkgWatt
6641 ± 4% +8.6% 7215 ± 8% slabinfo.anon_vma_chain.num_objs
1327 ± 3% +23.0% 1632 ± 5% slabinfo.kmalloc-96.active_objs
1327 ± 3% +23.0% 1632 ± 5% slabinfo.kmalloc-96.num_objs
1235 ± 30% +37.7% 1700 ± 18% interrupts.29:PCI-MSI.409600-edge.eth0
4361 ± 81% +149.4% 10876 ± 32% interrupts.CPU0.NMI:Non-maskable_interrupts
4361 ± 81% +149.4% 10876 ± 32% interrupts.CPU0.PMI:Performance_monitoring_interrupts
1235 ± 30% +37.7% 1700 ± 18% interrupts.CPU7.29:PCI-MSI.409600-edge.eth0
93196 +9.1% 101723 ± 6% sched_debug.cfs_rq:/.load.min
15.37 ± 11% +13.6% 17.46 ± 3% sched_debug.cfs_rq:/.nr_spread_over.max
5.01 ± 11% +14.5% 5.74 ± 4% sched_debug.cfs_rq:/.nr_spread_over.stddev
53.80 ± 15% +41.6% 76.21 ± 7% sched_debug.cfs_rq:/.util_avg.stddev
60098 +1.6% 61056 proc-vmstat.nr_active_anon
6867 -1.2% 6781 proc-vmstat.nr_slab_unreclaimable
60098 +1.6% 61056 proc-vmstat.nr_zone_active_anon
5.757e+08 -4.2% 5.517e+08 proc-vmstat.numa_hit
5.757e+08 -4.2% 5.517e+08 proc-vmstat.numa_local
5.758e+08 -4.1% 5.52e+08 proc-vmstat.pgalloc_normal
2.881e+08 -4.1% 2.762e+08 proc-vmstat.pgfault
5.758e+08 -4.1% 5.52e+08 proc-vmstat.pgfree
2.861e+09 ± 41% +41.1% 4.038e+09 perf-stat.i.branch-instructions
41921318 ± 38% +34.9% 56552695 ± 2% perf-stat.i.cache-references
2.173e+10 ± 41% +34.9% 2.931e+10 perf-stat.i.cpu-cycles
2.26e+09 ± 41% +41.3% 3.194e+09 perf-stat.i.dTLB-stores
57813 ± 26% +66.7% 96370 ± 6% perf-stat.i.iTLB-loads
1.365e+10 ± 41% +37.9% 1.882e+10 perf-stat.i.instructions
661.20 ± 40% +45.4% 961.52 perf-stat.i.instructions-per-iTLB-miss
0.47 ± 41% +37.3% 0.64 perf-stat.i.ipc
948620 -3.5% 915067 perf-stat.i.minor-faults
948620 -3.5% 915067 perf-stat.i.page-faults
0.51 ± 7% -0.1 0.45 perf-stat.overall.branch-miss-rate%
1.59 -2.4% 1.56 perf-stat.overall.cpi
0.38 -0.0 0.35 ± 2% perf-stat.overall.dTLB-store-miss-rate%
875.11 +8.7% 950.89 perf-stat.overall.instructions-per-iTLB-miss
0.63 +2.4% 0.64 perf-stat.overall.ipc
4337585 ± 41% +42.3% 6173557 perf-stat.overall.path-length
2.855e+09 ± 41% +41.0% 4.028e+09 perf-stat.ps.branch-instructions
41833739 ± 38% +34.8% 56408902 ± 2% perf-stat.ps.cache-references
2.255e+09 ± 41% +41.2% 3.186e+09 perf-stat.ps.dTLB-stores
57677 ± 26% +66.7% 96124 ± 6% perf-stat.ps.iTLB-loads
1.362e+10 ± 41% +37.8% 1.877e+10 perf-stat.ps.instructions
946368 -3.6% 912714 perf-stat.ps.minor-faults
946368 -3.6% 912714 perf-stat.ps.page-faults
4.155e+12 ± 41% +36.5% 5.672e+12 perf-stat.total.instructions
20.10 -0.7 19.42 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
17.83 -0.7 17.17 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
5.47 ± 2% -0.5 4.92 ± 4% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap
5.75 ± 2% -0.5 5.20 ± 4% perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
5.69 ± 2% -0.5 5.17 ± 4% perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap
2.61 -0.5 2.16 ± 12% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
2.09 ± 2% -0.4 1.67 ± 15% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
2.81 ± 2% -0.2 2.56 ± 2% perf-profile.calltrace.cycles-pp.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault.__do_page_fault
2.62 ± 2% -0.2 2.45 ± 2% perf-profile.calltrace.cycles-pp.flush_tlb_func_common.flush_tlb_mm_range.tlb_flush_mmu.tlb_finish_mmu.unmap_region
1.89 ± 2% -0.2 1.73 perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.unmap_region.__do_munmap.__vm_munmap
3.05 ± 2% -0.1 2.91 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap
1.07 ± 3% -0.1 0.95 ± 2% perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.91 ± 3% -0.1 0.84 ± 4% perf-profile.calltrace.cycles-pp.native_flush_tlb.flush_tlb_func_common.flush_tlb_mm_range.tlb_flush_mmu.tlb_finish_mmu
1.94 ± 3% +0.1 2.06 perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
1.31 ± 8% +0.1 1.45 perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.31 ± 81% +0.2 0.54 ± 3% perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault.__do_page_fault
2.27 ± 50% +0.7 2.97 ± 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
43.67 +2.4 46.10 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
39.41 ± 2% +2.7 42.07 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
18.28 ± 2% +3.7 21.95 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
17.43 ± 2% +3.7 21.12 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
35.89 ± 50% +11.0 46.92 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.13 ± 50% +11.1 47.22 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
51.68 ± 50% +14.5 66.17 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
51.90 ± 50% +14.5 66.42 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
17.89 -0.7 17.20 perf-profile.children.cycles-pp.handle_mm_fault
20.13 -0.7 19.45 perf-profile.children.cycles-pp.__do_page_fault
5.25 ± 2% -0.6 4.62 ± 8% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
5.50 ± 2% -0.6 4.95 ± 4% perf-profile.children.cycles-pp.pagevec_lru_move_fn
5.93 ± 2% -0.5 5.39 ± 4% perf-profile.children.cycles-pp.lru_add_drain
5.86 ± 2% -0.5 5.33 ± 4% perf-profile.children.cycles-pp.lru_add_drain_cpu
2.80 ± 2% -0.3 2.55 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64
2.86 ± 2% -0.3 2.60 ± 2% perf-profile.children.cycles-pp.__anon_vma_prepare
1.92 ± 3% -0.2 1.75 perf-profile.children.cycles-pp.unlink_anon_vmas
1.88 ± 4% -0.2 1.72 ± 2% perf-profile.children.cycles-pp.percpu_counter_add_batch
2.03 ± 3% -0.2 1.88 perf-profile.children.cycles-pp.free_pgtables
3.06 ± 2% -0.1 2.92 perf-profile.children.cycles-pp.flush_tlb_mm_range
0.89 ± 5% -0.1 0.76 ± 6% perf-profile.children.cycles-pp.__might_sleep
1.58 ± 2% -0.1 1.45 perf-profile.children.cycles-pp.native_flush_tlb
1.97 -0.1 1.85 ± 2% perf-profile.children.cycles-pp.flush_tlb_func_common
0.41 ± 8% -0.1 0.32 ± 8% perf-profile.children.cycles-pp.___pte_free_tlb
0.10 ± 14% -0.1 0.03 ±100% perf-profile.children.cycles-pp.should_fail_alloc_page
0.55 ± 3% -0.1 0.49 ± 4% perf-profile.children.cycles-pp.down_write
0.10 ± 19% -0.1 0.05 ± 58% perf-profile.children.cycles-pp.should_failslab
0.28 ± 10% -0.0 0.23 perf-profile.children.cycles-pp.anon_vma_interval_tree_remove
0.11 ± 19% -0.0 0.07 ± 7% perf-profile.children.cycles-pp.policy_nodemask
0.10 ± 11% -0.0 0.06 ± 14% perf-profile.children.cycles-pp.__vma_link_file
0.11 ± 9% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.anon_vma_chain_link
0.13 ± 8% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.try_charge
0.18 ± 6% -0.0 0.16 ± 5% perf-profile.children.cycles-pp.inc_zone_page_state
0.14 ± 2% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.anon_vma_interval_tree_insert
0.10 ± 17% +0.0 0.14 ± 7% perf-profile.children.cycles-pp.strlen
0.52 ± 2% +0.0 0.56 ± 3% perf-profile.children.cycles-pp.mem_cgroup_commit_charge
0.17 ± 16% +0.0 0.21 ± 6% perf-profile.children.cycles-pp.uncharge_page
0.08 ± 16% +0.0 0.13 ± 7% perf-profile.children.cycles-pp.__vma_link_list
0.26 ± 6% +0.1 0.31 ± 6% perf-profile.children.cycles-pp.mem_cgroup_charge_statistics
0.00 +0.1 0.06 ± 22% perf-profile.children.cycles-pp.__get_vma_policy
0.13 ± 9% +0.1 0.19 ± 9% perf-profile.children.cycles-pp.vma_merge
0.02 ±122% +0.1 0.09 ± 11% perf-profile.children.cycles-pp.kthread_blkcg
0.25 ± 11% +0.1 0.33 ± 6% perf-profile.children.cycles-pp.get_task_policy
0.00 +0.1 0.08 ± 5% perf-profile.children.cycles-pp.memcpy
0.25 ± 9% +0.1 0.35 ± 2% perf-profile.children.cycles-pp.memcpy_erms
1.97 ± 2% +0.1 2.09 perf-profile.children.cycles-pp.get_unmapped_area
1.34 ± 7% +0.1 1.47 ± 2% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.38 ± 5% +0.1 0.52 ± 5% perf-profile.children.cycles-pp.alloc_pages_current
3.08 ± 2% +0.2 3.24 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
64.46 +2.0 66.45 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
64.19 +2.0 66.19 perf-profile.children.cycles-pp.do_syscall_64
43.77 +2.4 46.18 perf-profile.children.cycles-pp.__do_munmap
44.49 +2.5 46.95 perf-profile.children.cycles-pp.__vm_munmap
44.77 +2.5 47.24 perf-profile.children.cycles-pp.__x64_sys_munmap
39.43 ± 2% +2.7 42.10 perf-profile.children.cycles-pp.unmap_region
18.07 ± 2% +3.7 21.73 perf-profile.children.cycles-pp.unmap_page_range
18.29 ± 2% +3.7 21.97 perf-profile.children.cycles-pp.unmap_vmas
6.02 ± 3% -0.5 5.57 ± 3% perf-profile.self.cycles-pp.do_syscall_64
1.73 -0.1 1.59 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
1.56 ± 2% -0.1 1.44 perf-profile.self.cycles-pp.native_flush_tlb
0.34 ± 11% -0.1 0.24 ± 7% perf-profile.self.cycles-pp.strlcpy
0.57 ± 5% -0.1 0.49 ± 6% perf-profile.self.cycles-pp.unlink_anon_vmas
0.68 ± 4% -0.1 0.60 ± 8% perf-profile.self.cycles-pp._raw_spin_lock
0.37 ± 5% -0.1 0.31 ± 6% perf-profile.self.cycles-pp.cpumask_any_but
0.42 ± 7% -0.1 0.36 ± 6% perf-profile.self.cycles-pp.handle_mm_fault
0.23 ± 7% -0.1 0.18 ± 4% perf-profile.self.cycles-pp.__perf_sw_event
0.10 ± 23% -0.0 0.06 ± 9% perf-profile.self.cycles-pp.policy_nodemask
0.09 ± 11% -0.0 0.04 ± 59% perf-profile.self.cycles-pp.__vma_link_file
0.13 ± 6% -0.0 0.10 ± 8% perf-profile.self.cycles-pp.try_charge
0.14 ± 2% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
0.10 ± 15% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.strlen
0.09 ± 17% +0.0 0.12 ± 5% perf-profile.self.cycles-pp.memcg_check_events
0.07 ± 19% +0.0 0.10 ± 7% perf-profile.self.cycles-pp.__vma_link_list
0.16 ± 16% +0.0 0.20 ± 5% perf-profile.self.cycles-pp.uncharge_page
0.24 ± 7% +0.0 0.28 ± 2% perf-profile.self.cycles-pp.memcpy_erms
0.04 ± 53% +0.0 0.09 ± 8% perf-profile.self.cycles-pp.do_page_fault
0.42 ± 9% +0.1 0.48 ± 7% perf-profile.self.cycles-pp.find_next_bit
0.13 ± 10% +0.1 0.19 ± 8% perf-profile.self.cycles-pp.vma_merge
0.02 ±122% +0.1 0.09 ± 11% perf-profile.self.cycles-pp.kthread_blkcg
0.25 ± 10% +0.1 0.32 ± 7% perf-profile.self.cycles-pp.get_task_policy
0.00 +0.1 0.08 ± 6% perf-profile.self.cycles-pp.memcpy
0.14 ± 5% +0.1 0.25 ± 15% perf-profile.self.cycles-pp.alloc_pages_current
3.08 ± 2% +0.2 3.23 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.43 ± 10% +0.2 0.58 ± 6% perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown
11.00 ± 2% +3.6 14.56 perf-profile.self.cycles-pp.unmap_page_range
will-it-scale.per_process_ops
120000 +-+----------------------------------------------------------------+
| +. +. +.. .. + +. +. + |
119000 +-+ +.+..+..+ |
118000 +-+ |
| |
117000 +-+ |
| O |
116000 O-+O O O O |
| O O O O O |
115000 +-+ O O O O O O O O
114000 +-+ |
| O O O O O |
113000 +-+ O |
| |
112000 +-+----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Oliver Sang
View attachment "config-5.3.0-rc2-00001-g755d6edc1aee44" of type "text/plain" (199591 bytes)
View attachment "job-script" of type "text/plain" (7364 bytes)
View attachment "job.yaml" of type "text/plain" (4989 bytes)
View attachment "reproduce" of type "text/plain" (310 bytes)
Powered by blists - more mailing lists