[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202212191714.524e00b3-yujie.liu@intel.com>
Date: Mon, 19 Dec 2022 17:58:52 +0800
From: kernel test robot <yujie.liu@...el.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>
CC: Yu Zhao <yuzhao@...gle.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>,
Catalin Marinas <catalin.marinas@....com>,
David Hildenbrand <david@...hat.com>,
"David Howells" <dhowells@...hat.com>,
Davidlohr Bueso <dave@...olabs.net>,
"SeongJae Park" <sj@...nel.org>,
Sven Schnelle <svens@...ux.ibm.com>,
Will Deacon <will@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
<oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<maple-tree@...ts.infradead.org>, <ying.huang@...el.com>,
<feng.tang@...el.com>, <zhengjun.xing@...ux.intel.com>,
<fengwei.yin@...el.com>
Subject: [linus:master] will-it-scale.per_thread_ops -40.2% regression in
mmap1 benchmark
Greetings,
FYI, we noticed a -40.2% regression of will-it-scale.per_thread_ops
between commit 524e00b36e8c and e15e06a83923 of mainline
524e00b36e8c5 mm: remove rb tree.
0c563f1480435 proc: remove VMA rbtree use from nommu
d0cf3dd47f0d5 damon: convert __damon_va_three_regions to use the VMA iterator
c9dbe82cb99db kernel/fork: use maple tree for dup_mmap() during forking
3499a13168da6 mm/mmap: use maple tree for unmapped_area{_topdown}
7fdbd37da5c6f mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
be8432e7166ef mm/mmap: use the maple tree in find_vma() instead of the rbtree.
2e3af1db17442 mmap: use the VMA iterator in count_vma_pages_range()
f39af05949a42 mm: add VMA iterator
d4af56c5c7c67 mm: start tracking VMAs with maple tree
e15e06a839232 lib/test_maple_tree: add testing for maple tree
in testcase: will-it-scale
on test machine: 104 threads 2 sockets (Skylake) with 192G memory
with following parameters:
nr_task: 50%
mode: thread
test: mmap1
cpufreq_governor: performance
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
We couldn't find out the commit that introduced this regression because
some of above commits failed to boot during bisection, but looks it is
related with maple tree code. Please check following details:
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/mmap1/will-it-scale
commit:
e15e06a839232 ("lib/test_maple_tree: add testing for maple tree")
524e00b36e8c5 ("mm: remove rb tree.")
e15e06a8392321a1 524e00b36e8c547f5582eef3fb6
---------------- ---------------------------
%stddev %change %stddev
\ | \
238680 -40.2% 142816 will-it-scale.52.threads
4589 -40.2% 2746 will-it-scale.per_thread_ops
238680 -40.2% 142816 will-it-scale.workload
0.28 -0.1 0.20 ± 3% mpstat.cpu.all.usr%
7758 -1.6% 7636 proc-vmstat.nr_mapped
0.03 ± 14% +40.0% 0.05 ± 10% time.system_time
35.87 ± 41% -17.2 18.71 ± 92% turbostat.C1E%
14.11 ±105% +15.5 29.62 ± 52% turbostat.C6%
466662 ± 3% +20.3% 561351 ± 3% turbostat.POLL
42.33 +3.9% 44.00 turbostat.PkgTmp
838.08 ± 49% -50.7% 412.94 ± 19% sched_debug.cfs_rq:/.load_avg.max
466231 ± 14% -53.4% 217040 ± 82% sched_debug.cfs_rq:/.min_vruntime.min
-335910 +146.5% -828023 sched_debug.cfs_rq:/.spread0.min
602391 ± 4% +6.5% 641749 ± 4% sched_debug.cpu.avg_idle.avg
26455 ± 7% +16.1% 30723 ± 6% sched_debug.cpu.nr_switches.max
230323 ± 6% +42.4% 327946 ± 3% numa-numastat.node0.local_node
257238 ± 2% +29.2% 332446 numa-numastat.node0.numa_hit
26826 ± 35% -83.1% 4532 ±138% numa-numastat.node0.other_node
344370 ± 3% -26.8% 251981 ± 2% numa-numastat.node1.local_node
351214 ± 2% -19.9% 281185 numa-numastat.node1.numa_hit
6779 ±139% +330.8% 29204 ± 21% numa-numastat.node1.other_node
111776 ± 8% +43.9% 160892 ± 17% numa-meminfo.node0.AnonHugePages
163879 ± 5% +34.9% 221083 ± 21% numa-meminfo.node0.AnonPages
182360 ± 2% +39.7% 254705 ± 15% numa-meminfo.node0.AnonPages.max
167687 ± 4% +33.0% 223029 ± 20% numa-meminfo.node0.Inactive
165329 ± 4% +34.9% 223029 ± 20% numa-meminfo.node0.Inactive(anon)
2357 ±131% -100.0% 0.00 numa-meminfo.node0.Inactive(file)
2087 ± 11% +22.1% 2548 ± 9% numa-meminfo.node0.PageTables
170594 ± 7% -27.5% 123611 ± 23% numa-meminfo.node1.AnonHugePages
238127 ± 3% -23.9% 181170 ± 25% numa-meminfo.node1.AnonPages
278201 ± 3% -26.8% 203778 ± 22% numa-meminfo.node1.AnonPages.max
244262 ± 2% -24.0% 185599 ± 25% numa-meminfo.node1.Inactive
244206 ± 2% -24.1% 185419 ± 25% numa-meminfo.node1.Inactive(anon)
20767 ± 64% -48.4% 10717 ±124% numa-meminfo.node1.Mapped
40936 ± 5% +34.9% 55213 ± 21% numa-vmstat.node0.nr_anon_pages
41317 ± 4% +34.8% 55700 ± 20% numa-vmstat.node0.nr_inactive_anon
41317 ± 4% +34.8% 55700 ± 20% numa-vmstat.node0.nr_zone_inactive_anon
257331 ± 2% +29.2% 332536 numa-vmstat.node0.numa_hit
230417 ± 5% +42.4% 328036 ± 3% numa-vmstat.node0.numa_local
26826 ± 35% -83.1% 4532 ±138% numa-vmstat.node0.numa_other
59518 ± 4% -24.0% 45237 ± 25% numa-vmstat.node1.nr_anon_pages
61041 ± 3% -24.2% 46287 ± 25% numa-vmstat.node1.nr_inactive_anon
5196 ± 64% -48.7% 2666 ±126% numa-vmstat.node1.nr_mapped
61041 ± 3% -24.2% 46287 ± 25% numa-vmstat.node1.nr_zone_inactive_anon
351314 ± 2% -20.0% 281191 numa-vmstat.node1.numa_hit
344470 ± 4% -26.8% 251987 ± 2% numa-vmstat.node1.numa_local
6779 ±139% +330.8% 29204 ± 21% numa-vmstat.node1.numa_other
3.12 ± 10% -25.7% 2.32 ± 2% perf-stat.i.MPKI
3.111e+09 +4.4% 3.247e+09 perf-stat.i.branch-instructions
0.43 -0.0 0.39 perf-stat.i.branch-miss-rate%
13577850 -5.5% 12837395 perf-stat.i.branch-misses
38.85 ± 3% +4.6 43.44 ± 3% perf-stat.i.cache-miss-rate%
47922345 ± 10% -21.9% 37423833 ± 2% perf-stat.i.cache-references
9.42 -5.1% 8.94 perf-stat.i.cpi
0.02 -0.0 0.01 perf-stat.i.dTLB-load-miss-rate%
632005 -28.8% 449814 perf-stat.i.dTLB-load-misses
4.127e+09 +3.8% 4.282e+09 perf-stat.i.dTLB-loads
0.00 ± 7% -0.0 0.00 ± 11% perf-stat.i.dTLB-store-miss-rate%
3.131e+08 +26.5% 3.962e+08 perf-stat.i.dTLB-stores
599587 ± 8% -20.0% 479492 ± 6% perf-stat.i.iTLB-load-misses
2324378 -12.7% 2028806 ± 7% perf-stat.i.iTLB-loads
1.54e+10 +5.4% 1.622e+10 perf-stat.i.instructions
25907 ± 7% +31.4% 34030 ± 6% perf-stat.i.instructions-per-iTLB-miss
0.11 +5.4% 0.11 perf-stat.i.ipc
570.88 ± 8% -22.1% 444.53 ± 2% perf-stat.i.metric.K/sec
72.60 +5.0% 76.20 perf-stat.i.metric.M/sec
90.37 +1.5 91.82 perf-stat.i.node-load-miss-rate%
7458505 ± 2% -27.2% 5431142 ± 3% perf-stat.i.node-load-misses
795163 -39.1% 484036 perf-stat.i.node-loads
3.11 ± 10% -25.9% 2.31 ± 2% perf-stat.overall.MPKI
0.44 -0.0 0.40 perf-stat.overall.branch-miss-rate%
38.72 ± 3% +4.5 43.24 ± 3% perf-stat.overall.cache-miss-rate%
9.40 -5.1% 8.93 perf-stat.overall.cpi
0.02 -0.0 0.01 perf-stat.overall.dTLB-load-miss-rate%
0.00 ± 6% -0.0 0.00 ± 11% perf-stat.overall.dTLB-store-miss-rate%
25842 ± 7% +31.5% 33976 ± 6% perf-stat.overall.instructions-per-iTLB-miss
0.11 +5.4% 0.11 perf-stat.overall.ipc
90.36 +1.4 91.81 perf-stat.overall.node-load-miss-rate%
19478525 +76.1% 34307144 perf-stat.overall.path-length
3.101e+09 +4.4% 3.236e+09 perf-stat.ps.branch-instructions
13536210 -5.5% 12794692 perf-stat.ps.branch-misses
47758992 ± 10% -21.9% 37302259 ± 2% perf-stat.ps.cache-references
629957 -28.8% 448327 perf-stat.ps.dTLB-load-misses
4.113e+09 +3.8% 4.268e+09 perf-stat.ps.dTLB-loads
3.121e+08 +26.5% 3.949e+08 perf-stat.ps.dTLB-stores
597514 ± 8% -20.0% 477834 ± 6% perf-stat.ps.iTLB-load-misses
2316405 -12.7% 2021878 ± 7% perf-stat.ps.iTLB-loads
1.535e+10 +5.4% 1.617e+10 perf-stat.ps.instructions
7434434 ± 2% -27.2% 5412315 ± 3% perf-stat.ps.node-load-misses
792675 -39.1% 482405 perf-stat.ps.node-loads
4.648e+12 +5.4% 4.9e+12 perf-stat.total.instructions
24.16 ± 66% -16.4 7.77 ±122% perf-profile.calltrace.cycles-pp.mwait_idle_with_hints.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
24.16 ± 66% -16.4 7.77 ±122% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
33.88 ± 20% -9.6 24.32 ± 7% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
33.89 ± 20% -9.3 24.61 ± 6% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
33.05 ± 20% -8.9 24.13 ± 6% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
33.04 ± 20% -8.9 24.13 ± 6% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
33.05 ± 20% -8.9 24.14 ± 6% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
33.05 ± 20% -8.9 24.14 ± 6% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
33.05 ± 20% -8.9 24.14 ± 6% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
0.38 ± 70% +0.2 0.61 ± 2% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.6 0.56 ± 2% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
0.00 +0.6 0.57 ± 2% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.00 +0.6 0.60 ± 3% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
31.73 ± 10% +4.5 36.24 ± 2% perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
31.50 ± 10% +4.6 36.05 perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
33.19 ± 10% +4.6 37.77 ± 2% perf-profile.calltrace.cycles-pp.__munmap
32.39 ± 10% +4.6 36.97 ± 2% perf-profile.calltrace.cycles-pp.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
32.34 ± 10% +4.6 36.94 ± 2% perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64
33.08 ± 10% +4.6 37.69 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
32.15 ± 10% +4.6 36.76 perf-profile.calltrace.cycles-pp.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
33.05 ± 10% +4.6 37.66 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
32.31 ± 10% +4.6 36.92 ± 2% perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap
32.10 ± 10% +4.6 36.73 perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
32.98 ± 10% +4.6 37.61 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
32.07 ± 10% +4.6 36.70 perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64
32.98 ± 10% +4.6 37.61 ± 2% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
32.86 ± 10% +4.7 37.56 perf-profile.calltrace.cycles-pp.__mmap
32.74 ± 10% +4.7 37.48 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
32.71 ± 10% +4.8 37.46 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
32.62 ± 10% +4.8 37.39 perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
24.31 ± 66% -16.4 7.88 ±122% perf-profile.children.cycles-pp.intel_idle
33.89 ± 20% -9.3 24.61 ± 6% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
33.89 ± 20% -9.3 24.61 ± 6% perf-profile.children.cycles-pp.cpu_startup_entry
33.89 ± 20% -9.3 24.61 ± 6% perf-profile.children.cycles-pp.do_idle
33.85 ± 20% -9.3 24.57 ± 6% perf-profile.children.cycles-pp.mwait_idle_with_hints
33.88 ± 20% -9.3 24.60 ± 6% perf-profile.children.cycles-pp.cpuidle_enter
33.88 ± 20% -9.3 24.60 ± 6% perf-profile.children.cycles-pp.cpuidle_enter_state
33.88 ± 20% -9.3 24.61 ± 6% perf-profile.children.cycles-pp.cpuidle_idle_call
33.05 ± 20% -8.9 24.14 ± 6% perf-profile.children.cycles-pp.start_secondary
0.84 ± 25% -0.4 0.48 ± 16% perf-profile.children.cycles-pp.start_kernel
0.84 ± 25% -0.4 0.48 ± 16% perf-profile.children.cycles-pp.arch_call_rest_init
0.84 ± 25% -0.4 0.48 ± 16% perf-profile.children.cycles-pp.rest_init
0.16 ± 12% -0.1 0.08 perf-profile.children.cycles-pp.unmap_region
0.14 ± 11% -0.0 0.10 ± 8% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.13 ± 12% -0.0 0.10 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.00 +0.1 0.06 ± 13% perf-profile.children.cycles-pp.mas_wr_node_store
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.memset_erms
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.mas_wr_modify
0.00 +0.1 0.07 ± 6% perf-profile.children.cycles-pp.kmem_cache_free_bulk
0.53 ± 10% +0.1 0.61 ± 2% perf-profile.children.cycles-pp.__do_munmap
0.00 +0.1 0.08 ± 5% perf-profile.children.cycles-pp.mas_destroy
0.00 +0.1 0.09 ± 5% perf-profile.children.cycles-pp.mt_find
0.00 +0.1 0.10 perf-profile.children.cycles-pp.mas_spanning_rebalance
0.00 +0.1 0.10 ± 4% perf-profile.children.cycles-pp.mas_wr_spanning_store
0.00 +0.1 0.12 ± 4% perf-profile.children.cycles-pp.mas_rev_awalk
0.00 +0.1 0.13 perf-profile.children.cycles-pp.mas_empty_area_rev
0.00 +0.1 0.14 ± 5% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
0.00 +0.2 0.16 ± 5% perf-profile.children.cycles-pp.mas_alloc_nodes
0.00 +0.2 0.17 ± 4% perf-profile.children.cycles-pp.mas_preallocate
0.42 ± 15% +0.2 0.60 ± 3% perf-profile.children.cycles-pp.do_mmap
0.06 ± 7% +0.2 0.27 perf-profile.children.cycles-pp.vma_link
0.20 ± 14% +0.2 0.41 ± 4% perf-profile.children.cycles-pp.mmap_region
0.00 +0.3 0.35 ± 4% perf-profile.children.cycles-pp.mas_store_prealloc
0.78 ± 8% +0.4 1.13 ± 2% perf-profile.children.cycles-pp.rwsem_spin_on_owner
33.20 ± 10% +4.6 37.77 ± 2% perf-profile.children.cycles-pp.__munmap
32.98 ± 10% +4.6 37.61 ± 2% perf-profile.children.cycles-pp.__x64_sys_munmap
32.98 ± 10% +4.6 37.61 ± 2% perf-profile.children.cycles-pp.__vm_munmap
32.86 ± 10% +4.7 37.56 perf-profile.children.cycles-pp.__mmap
32.62 ± 10% +4.8 37.40 perf-profile.children.cycles-pp.vm_mmap_pgoff
63.26 ± 10% +9.1 72.32 ± 2% perf-profile.children.cycles-pp.osq_lock
64.54 ± 10% +9.2 73.72 ± 2% perf-profile.children.cycles-pp.down_write_killable
64.44 ± 10% +9.2 73.66 ± 2% perf-profile.children.cycles-pp.rwsem_down_write_slowpath
64.38 ± 10% +9.2 73.62 ± 2% perf-profile.children.cycles-pp.rwsem_optimistic_spin
65.87 ± 10% +9.3 75.21 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
65.79 ± 10% +9.4 75.15 ± 2% perf-profile.children.cycles-pp.do_syscall_64
33.85 ± 20% -9.3 24.57 ± 6% perf-profile.self.cycles-pp.mwait_idle_with_hints
0.29 ± 19% -0.1 0.14 ± 3% perf-profile.self.cycles-pp.rwsem_optimistic_spin
0.13 ± 10% -0.0 0.09 ± 9% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.09 ± 9% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.down_write_killable
0.13 ± 12% -0.0 0.10 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.00 +0.1 0.06 perf-profile.self.cycles-pp.memset_erms
0.00 +0.1 0.06 ± 13% perf-profile.self.cycles-pp.kmem_cache_free_bulk
0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.kmem_cache_alloc_bulk
0.00 +0.1 0.08 perf-profile.self.cycles-pp.mt_find
0.00 +0.1 0.11 ± 4% perf-profile.self.cycles-pp.mas_rev_awalk
0.76 ± 8% +0.4 1.12 ± 2% perf-profile.self.cycles-pp.rwsem_spin_on_owner
62.94 ± 10% +9.0 71.91 ± 2% perf-profile.self.cycles-pp.osq_lock
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202212191714.524e00b3-yujie.liu@intel.com
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
View attachment "config-6.1.0-rc7-00211-g0ba09b173387" of type "text/plain" (166140 bytes)
View attachment "job-script" of type "text/plain" (7672 bytes)
View attachment "job.yaml" of type "text/plain" (5189 bytes)
View attachment "reproduce" of type "text/plain" (344 bytes)
Powered by blists - more mailing lists