[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202210181535.7144dd15-yujie.liu@intel.com>
Date: Tue, 18 Oct 2022 16:44:59 +0800
From: kernel test robot <yujie.liu@...el.com>
To: Rik van Riel <riel@...riel.com>
CC: <lkp@...ts.01.org>, <lkp@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Yang Shi <shy828301@...il.com>,
Matthew Wilcox <willy@...radead.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<ying.huang@...el.com>, <feng.tang@...el.com>,
<zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: [mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression
Hi Rik,
Please be noted that we reported
[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression
at
https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
when this commit was on
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
now we noticed the commit has been merged to mainline and we still
observed similar test result.
We are not sure if this is an expected performance change when
switching to huge page or if it could benefit other use cases, so we
report again FYI. Please feel free to ignore the report if the test
result is just as expected. Thanks.
Greeting,
FYI, we noticed a -95.5% regression of will-it-scale.per_process_ops due to commit:
commit: f35b5d7d676e59e401690b678cd3cfec5e785c23 ("mm: align larger anonymous mappings on THP boundaries")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
with following parameters:
nr_task: 100%
mode: process
test: malloc1
cpufreq_governor: performance
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
In addition to that, the commit also has significant impact on the following tests:
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.numa.ops_per_sec -52.9% regression |
| test machine | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory |
| test parameters | class=cpu |
| | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=numa |
| | testtime=60s |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -92.9% regression |
| test machine | 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=16 |
| | test=malloc1 |
+------------------+----------------------------------------------------------------------------------------------------+
Details are as below:
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp2/malloc1/will-it-scale
commit:
7b5a0b664e ("mm/page_ext: remove unused variable in offline_page_ext")
f35b5d7d67 ("mm: align larger anonymous mappings on THP boundaries")
7b5a0b664ebe2625 f35b5d7d676e59e401690b678cd
---------------- ---------------------------
%stddev %change %stddev
\ | \
2765078 -95.5% 124266 ± 4% will-it-scale.128.processes
21601 -95.5% 970.33 ± 4% will-it-scale.per_process_ops
2765078 -95.5% 124266 ± 4% will-it-scale.workload
1943 -3.5% 1874 vmstat.system.cs
0.68 +1.6 2.32 ± 4% mpstat.cpu.all.irq%
0.00 ± 3% +0.0 0.03 ± 4% mpstat.cpu.all.soft%
0.89 ± 2% -0.4 0.49 ± 2% mpstat.cpu.all.usr%
0.09 ± 4% -88.7% 0.01 turbostat.IPC
351.67 +11.2% 391.20 turbostat.PkgWatt
29.30 +299.5% 117.04 turbostat.RAMWatt
8.251e+08 -95.5% 37387027 numa-numastat.node0.local_node
8.252e+08 -95.5% 37467051 numa-numastat.node0.numa_hit
8.405e+08 -95.5% 38161942 ± 7% numa-numastat.node1.local_node
8.406e+08 -95.5% 38196663 ± 7% numa-numastat.node1.numa_hit
174409 ± 9% -21.4% 137126 ± 2% meminfo.Active
174071 ± 9% -21.4% 136806 ± 2% meminfo.Active(anon)
311891 +10.9% 346028 meminfo.AnonPages
343079 +42.0% 487127 meminfo.Inactive
342068 +42.1% 486072 meminfo.Inactive(anon)
69414 ± 2% +221.8% 223379 ± 2% meminfo.Mapped
204255 ± 8% +24.9% 255031 meminfo.Shmem
32528 ± 48% +147.6% 80547 ± 38% numa-meminfo.node0.AnonHugePages
92821 ± 23% +59.3% 147839 ± 28% numa-meminfo.node0.AnonPages
99694 ± 17% +56.9% 156414 ± 26% numa-meminfo.node0.Inactive
99136 ± 17% +57.7% 156290 ± 26% numa-meminfo.node0.Inactive(anon)
30838 ± 53% +134.0% 72155 ± 21% numa-meminfo.node0.Mapped
171865 ± 9% -22.7% 132920 ± 2% numa-meminfo.node1.Active
171791 ± 9% -22.7% 132730 ± 2% numa-meminfo.node1.Active(anon)
243260 ± 7% +36.0% 330799 ± 11% numa-meminfo.node1.Inactive
242807 ± 7% +35.9% 329868 ± 11% numa-meminfo.node1.Inactive(anon)
38681 ± 37% +291.2% 151319 ± 8% numa-meminfo.node1.Mapped
195654 ± 8% +27.6% 249732 numa-meminfo.node1.Shmem
23192 ± 23% +59.3% 36946 ± 28% numa-vmstat.node0.nr_anon_pages
24771 ± 17% +57.7% 39074 ± 26% numa-vmstat.node0.nr_inactive_anon
7625 ± 53% +136.5% 18031 ± 21% numa-vmstat.node0.nr_mapped
24771 ± 17% +57.8% 39085 ± 26% numa-vmstat.node0.nr_zone_inactive_anon
8.252e+08 -95.5% 37466761 numa-vmstat.node0.numa_hit
8.251e+08 -95.5% 37386737 numa-vmstat.node0.numa_local
43036 ± 9% -23.1% 33107 ± 2% numa-vmstat.node1.nr_active_anon
60590 ± 7% +36.1% 82475 ± 11% numa-vmstat.node1.nr_inactive_anon
9533 ± 38% +297.1% 37858 ± 8% numa-vmstat.node1.nr_mapped
48889 ± 8% +27.6% 62403 numa-vmstat.node1.nr_shmem
43036 ± 9% -23.1% 33107 ± 2% numa-vmstat.node1.nr_zone_active_anon
60589 ± 7% +36.0% 82430 ± 11% numa-vmstat.node1.nr_zone_inactive_anon
8.406e+08 -95.5% 38196529 ± 7% numa-vmstat.node1.numa_hit
8.405e+08 -95.5% 38161808 ± 7% numa-vmstat.node1.numa_local
43513 ± 9% -21.8% 34042 ± 2% proc-vmstat.nr_active_anon
77940 +11.0% 86526 proc-vmstat.nr_anon_pages
762952 +1.7% 775553 proc-vmstat.nr_file_pages
85507 +42.1% 121487 proc-vmstat.nr_inactive_anon
17361 ± 2% +221.5% 55823 ± 2% proc-vmstat.nr_mapped
3300 +4.6% 3452 proc-vmstat.nr_page_table_pages
51081 ± 8% +24.6% 63669 proc-vmstat.nr_shmem
43513 ± 9% -21.8% 34042 ± 2% proc-vmstat.nr_zone_active_anon
85507 +42.1% 121480 proc-vmstat.nr_zone_inactive_anon
23080 ± 20% +56.7% 36156 ± 11% proc-vmstat.numa_hint_faults
16266 ± 13% +75.2% 28496 ± 5% proc-vmstat.numa_hint_faults_local
1.666e+09 -95.5% 75751403 ± 3% proc-vmstat.numa_hit
63.17 ± 50% +2948.3% 1925 proc-vmstat.numa_huge_pte_updates
1.666e+09 -95.5% 75551968 ± 4% proc-vmstat.numa_local
176965 ± 9% +543.0% 1137910 proc-vmstat.numa_pte_updates
160517 ± 3% -14.3% 137522 ± 2% proc-vmstat.pgactivate
1.665e+09 -95.5% 75663487 ± 4% proc-vmstat.pgalloc_normal
8.332e+08 -95.4% 38289978 ± 4% proc-vmstat.pgfault
1.665e+09 -95.5% 75646557 ± 4% proc-vmstat.pgfree
18.00 +2.1e+08% 37369911 ± 4% proc-vmstat.thp_fault_alloc
1.46 ±223% +1.3e+06% 19552 sched_debug.cfs_rq:/.MIN_vruntime.avg
187.51 ±223% +1.3e+06% 2502777 sched_debug.cfs_rq:/.MIN_vruntime.max
16.51 ±223% +1.3e+06% 220350 sched_debug.cfs_rq:/.MIN_vruntime.stddev
233.78 ± 28% +45.4% 339.81 ± 29% sched_debug.cfs_rq:/.load_avg.max
1.46 ±223% +1.3e+06% 19552 sched_debug.cfs_rq:/.max_vruntime.avg
187.51 ±223% +1.3e+06% 2502777 sched_debug.cfs_rq:/.max_vruntime.max
16.51 ±223% +1.3e+06% 220350 sched_debug.cfs_rq:/.max_vruntime.stddev
20463200 -12.7% 17863314 sched_debug.cfs_rq:/.min_vruntime.min
227934 ± 6% +73.5% 395381 ± 6% sched_debug.cfs_rq:/.min_vruntime.stddev
557786 ± 6% +22.1% 680843 ± 7% sched_debug.cfs_rq:/.spread0.max
-668417 +343.1% -2961726 sched_debug.cfs_rq:/.spread0.min
227979 ± 6% +73.4% 395300 ± 6% sched_debug.cfs_rq:/.spread0.stddev
793.86 ± 3% -31.1% 546.72 ± 9% sched_debug.cfs_rq:/.util_avg.min
57.90 ± 8% +47.0% 85.09 ± 11% sched_debug.cfs_rq:/.util_avg.stddev
535.54 ± 3% -18.1% 438.80 sched_debug.cfs_rq:/.util_est_enqueued.avg
224.57 ± 5% -38.5% 138.22 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
957251 +12.9% 1080580 ± 11% sched_debug.cpu.avg_idle.avg
10.65 ± 5% +303.8% 43.02 ± 21% sched_debug.cpu.clock.stddev
644.97 +46.0% 941.70 ± 5% sched_debug.cpu.clock_task.stddev
0.00 ± 13% +107.1% 0.00 ± 18% sched_debug.cpu.next_balance.stddev
1802 ± 10% -19.3% 1454 ± 13% sched_debug.cpu.nr_switches.min
3.28 ± 13% +8839.1% 293.65 ± 3% perf-stat.i.MPKI
1.936e+10 -81.9% 3.503e+09 perf-stat.i.branch-instructions
0.14 -0.0 0.13 ± 3% perf-stat.i.branch-miss-rate%
26013221 -82.5% 4556180 ± 2% perf-stat.i.branch-misses
5.62 ± 5% +46.1 51.68 ± 3% perf-stat.i.cache-miss-rate%
15513099 ± 8% +14150.9% 2.211e+09 perf-stat.i.cache-misses
2.787e+08 ± 13% +1437.2% 4.284e+09 ± 4% perf-stat.i.cache-references
1870 -3.6% 1803 perf-stat.i.context-switches
3.87 +474.5% 22.23 perf-stat.i.cpi
3.288e+11 -1.4% 3.242e+11 perf-stat.i.cpu-cycles
174.70 -10.3% 156.69 perf-stat.i.cpu-migrations
21352 ± 8% -99.3% 159.27 ± 17% perf-stat.i.cycles-between-cache-misses
0.01 -0.0 0.00 ± 11% perf-stat.i.dTLB-load-miss-rate%
2874326 ± 2% -94.7% 152528 ± 10% perf-stat.i.dTLB-load-misses
2.047e+10 -81.3% 3.825e+09 perf-stat.i.dTLB-loads
0.25 -0.2 0.06 perf-stat.i.dTLB-store-miss-rate%
19343669 -95.4% 891050 ± 4% perf-stat.i.dTLB-store-misses
7.829e+09 -80.1% 1.561e+09 ± 3% perf-stat.i.dTLB-stores
8.49e+10 -82.8% 1.463e+10 perf-stat.i.instructions
0.26 -82.2% 0.05 perf-stat.i.ipc
0.14 ± 38% +60.3% 0.22 ± 12% perf-stat.i.major-faults
2.57 -1.4% 2.53 perf-stat.i.metric.GHz
265.58 +203.4% 805.78 ± 15% perf-stat.i.metric.K/sec
374.50 -68.2% 119.04 perf-stat.i.metric.M/sec
2757302 -95.4% 126231 ± 4% perf-stat.i.minor-faults
92.63 +3.9 96.51 perf-stat.i.node-load-miss-rate%
3007607 ± 4% +285.4% 11591077 ± 6% perf-stat.i.node-load-misses
240194 ± 17% +71.9% 412981 ± 6% perf-stat.i.node-loads
97.87 -92.4 5.47 ± 7% perf-stat.i.node-store-miss-rate%
5503394 +2009.9% 1.161e+08 ± 7% perf-stat.i.node-store-misses
119412 ± 6% +1.7e+06% 2.041e+09 perf-stat.i.node-stores
2757302 -95.4% 126231 ± 4% perf-stat.i.page-faults
3.28 ± 13% +8826.2% 293.21 ± 3% perf-stat.overall.MPKI
0.13 -0.0 0.13 ± 2% perf-stat.overall.branch-miss-rate%
5.61 ± 5% +46.1 51.70 ± 3% perf-stat.overall.cache-miss-rate%
3.87 +473.3% 22.20 perf-stat.overall.cpi
21335 ± 8% -99.3% 146.65 perf-stat.overall.cycles-between-cache-misses
0.01 -0.0 0.00 ± 9% perf-stat.overall.dTLB-load-miss-rate%
0.25 -0.2 0.06 perf-stat.overall.dTLB-store-miss-rate%
0.26 -82.6% 0.05 perf-stat.overall.ipc
92.65 +4.0 96.63 perf-stat.overall.node-load-miss-rate%
97.88 -92.5 5.38 ± 8% perf-stat.overall.node-store-miss-rate%
9272709 +283.9% 35600802 ± 3% perf-stat.overall.path-length
1.929e+10 -81.9% 3.487e+09 perf-stat.ps.branch-instructions
25928796 -82.7% 4477103 ± 2% perf-stat.ps.branch-misses
15464091 ± 8% +14157.2% 2.205e+09 perf-stat.ps.cache-misses
2.778e+08 ± 13% +1437.1% 4.27e+09 ± 4% perf-stat.ps.cache-references
1865 -3.9% 1791 perf-stat.ps.context-switches
3.277e+11 -1.4% 3.233e+11 perf-stat.ps.cpu-cycles
174.25 -11.7% 153.93 perf-stat.ps.cpu-migrations
2866660 ± 2% -94.7% 151686 ± 10% perf-stat.ps.dTLB-load-misses
2.041e+10 -81.3% 3.808e+09 perf-stat.ps.dTLB-loads
19279774 -95.4% 888826 ± 4% perf-stat.ps.dTLB-store-misses
7.803e+09 -80.1% 1.555e+09 ± 3% perf-stat.ps.dTLB-stores
8.462e+10 -82.8% 1.456e+10 perf-stat.ps.instructions
0.14 ± 38% +56.7% 0.21 ± 14% perf-stat.ps.major-faults
2748185 -95.4% 125830 ± 4% perf-stat.ps.minor-faults
2998556 ± 4% +291.3% 11734146 ± 6% perf-stat.ps.node-load-misses
239400 ± 17% +70.8% 408868 ± 7% perf-stat.ps.node-loads
5485289 +2007.4% 1.156e+08 ± 7% perf-stat.ps.node-store-misses
119090 ± 6% +1.7e+06% 2.035e+09 perf-stat.ps.node-stores
2748185 -95.4% 125831 ± 4% perf-stat.ps.page-faults
2.564e+13 -82.8% 4.417e+12 perf-stat.total.instructions
95.23 -79.8 15.41 ± 6% perf-profile.calltrace.cycles-pp.__munmap
95.08 -79.7 15.40 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
95.02 -79.6 15.39 ± 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
94.96 -79.6 15.37 ± 6% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
94.95 -79.6 15.37 ± 6% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
94.86 -79.5 15.35 ± 6% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
94.38 -79.2 15.22 ± 6% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
42.74 -42.7 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
42.74 -42.7 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap
42.72 -42.7 0.00 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap
41.84 -41.8 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region
41.70 -41.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
41.62 -41.6 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region
41.55 -41.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
41.52 -41.5 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
41.28 -41.3 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush
46.93 -37.0 9.94 ± 6% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
43.64 -33.8 9.84 ± 6% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap
43.40 -33.6 9.81 ± 6% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.__do_munmap
0.00 +0.6 0.56 ± 6% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
0.00 +0.6 0.57 ± 6% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
0.00 +0.6 0.60 ± 6% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.00 +0.7 0.67 ± 5% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.00 +0.7 0.74 ± 5% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.clear_page_erms
0.00 +0.8 0.77 ± 4% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page
0.00 +0.8 0.81 ± 4% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page
0.00 +1.1 1.09 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault
0.00 +1.1 1.11 ± 3% perf-profile.calltrace.cycles-pp.free_pcp_prepare.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
3.60 ± 3% +1.5 5.08 ± 7% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
3.51 ± 3% +1.6 5.06 ± 7% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
3.29 ± 3% +1.7 5.04 ± 7% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
0.00 +2.8 2.78 ± 2% perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +3.3 3.28 ± 2% perf-profile.calltrace.cycles-pp.__might_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +4.2 4.21 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page.zap_huge_pmd.zap_pmd_range
0.00 +4.2 4.21 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page.zap_huge_pmd
0.00 +4.3 4.34 ± 8% perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages.pte_alloc_one
0.00 +4.4 4.40 ± 8% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.pte_alloc_one.__do_huge_pmd_anonymous_page
0.00 +4.4 4.43 ± 8% perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages.__folio_alloc
0.00 +4.5 4.49 ± 8% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.pte_alloc_one.__do_huge_pmd_anonymous_page.__handle_mm_fault
0.00 +4.5 4.51 ± 8% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.__folio_alloc.vma_alloc_folio
0.00 +4.6 4.59 ± 8% perf-profile.calltrace.cycles-pp.__alloc_pages.pte_alloc_one.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +4.6 4.62 ± 8% perf-profile.calltrace.cycles-pp.pte_alloc_one.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +4.6 4.63 ± 8% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.zap_huge_pmd.zap_pmd_range.unmap_page_range
0.00 +4.7 4.70 ± 7% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.__folio_alloc.vma_alloc_folio.do_huge_pmd_anonymous_page
0.00 +4.7 4.72 ± 7% perf-profile.calltrace.cycles-pp.__alloc_pages.__folio_alloc.vma_alloc_folio.do_huge_pmd_anonymous_page.__handle_mm_fault
0.00 +4.7 4.73 ± 7% perf-profile.calltrace.cycles-pp.__folio_alloc.vma_alloc_folio.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +4.8 4.75 ± 7% perf-profile.calltrace.cycles-pp.vma_alloc_folio.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +4.8 4.76 ± 8% perf-profile.calltrace.cycles-pp.free_unref_page.zap_huge_pmd.zap_pmd_range.unmap_page_range.unmap_vmas
0.00 +4.8 4.82 ± 7% perf-profile.calltrace.cycles-pp.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.00 +4.9 4.88 ± 7% perf-profile.calltrace.cycles-pp.zap_huge_pmd.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
0.00 +8.2 8.22 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist
0.00 +8.2 8.23 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages
0.00 +8.3 8.35 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages
0.00 +8.3 8.35 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush
0.00 +8.4 8.37 ± 8% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
0.00 +9.6 9.60 ± 6% perf-profile.calltrace.cycles-pp.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region
0.00 +65.5 65.48 ± 2% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +72.5 72.51 ± 2% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +78.6 78.58 perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
2.62 ± 3% +80.9 83.56 perf-profile.calltrace.cycles-pp.asm_exc_page_fault
2.60 ± 3% +81.0 83.56 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
2.58 ± 3% +81.0 83.57 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.38 ± 3% +81.1 83.52 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.26 ± 3% +81.2 83.45 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
83.48 -83.4 0.06 ± 9% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
83.28 -83.2 0.08 ± 8% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
96.34 -80.3 16.09 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
96.27 -80.2 16.08 ± 6% perf-profile.children.cycles-pp.do_syscall_64
95.28 -79.9 15.41 ± 6% perf-profile.children.cycles-pp.__munmap
94.96 -79.6 15.37 ± 6% perf-profile.children.cycles-pp.__x64_sys_munmap
94.96 -79.6 15.37 ± 6% perf-profile.children.cycles-pp.__vm_munmap
94.87 -79.5 15.36 ± 6% perf-profile.children.cycles-pp.__do_munmap
94.39 -79.2 15.22 ± 6% perf-profile.children.cycles-pp.unmap_region
82.88 -62.0 20.90 ± 8% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
42.78 -42.8 0.00 perf-profile.children.cycles-pp.lru_add_drain
42.76 -42.8 0.00 perf-profile.children.cycles-pp.lru_add_drain_cpu
42.75 -42.6 0.10 perf-profile.children.cycles-pp.folio_batch_move_lru
46.94 -37.0 9.94 ± 6% perf-profile.children.cycles-pp.tlb_finish_mmu
43.64 -33.8 9.84 ± 6% perf-profile.children.cycles-pp.tlb_batch_pages_flush
43.62 -33.8 9.82 ± 6% perf-profile.children.cycles-pp.release_pages
3.21 ± 4% -3.1 0.09 ± 5% perf-profile.children.cycles-pp.flush_tlb_mm_range
3.11 ± 4% -3.1 0.06 ± 8% perf-profile.children.cycles-pp.flush_tlb_func
3.00 ± 4% -3.0 0.03 ± 70% perf-profile.children.cycles-pp.native_flush_tlb_one_user
1.33 ± 3% -0.9 0.42 ± 5% perf-profile.children.cycles-pp.__mmap
1.10 ± 4% -0.7 0.39 ± 5% perf-profile.children.cycles-pp.vm_mmap_pgoff
0.79 ± 4% -0.7 0.09 ± 7% perf-profile.children.cycles-pp.uncharge_batch
0.97 ± 4% -0.6 0.36 ± 5% perf-profile.children.cycles-pp.do_mmap
0.65 ± 5% -0.6 0.06 ± 8% perf-profile.children.cycles-pp.page_counter_uncharge
0.64 ± 4% -0.6 0.05 ± 8% perf-profile.children.cycles-pp.free_pgd_range
0.62 ± 4% -0.6 0.05 perf-profile.children.cycles-pp.free_p4d_range
0.81 ± 3% -0.5 0.30 ± 6% perf-profile.children.cycles-pp.mmap_region
0.59 ± 4% -0.5 0.08 ± 6% perf-profile.children.cycles-pp.kmem_cache_alloc
0.66 ± 5% -0.5 0.16 ± 4% perf-profile.children.cycles-pp.__mod_lruvec_page_state
0.55 ± 3% -0.5 0.06 ± 9% perf-profile.children.cycles-pp.__anon_vma_prepare
0.44 ± 4% -0.4 0.06 ± 9% perf-profile.children.cycles-pp.lru_add_fn
0.42 ± 5% -0.3 0.12 ± 4% perf-profile.children.cycles-pp.free_pgtables
0.40 ± 4% -0.3 0.10 ± 9% perf-profile.children.cycles-pp.kmem_cache_free
0.41 ± 5% -0.3 0.11 ± 5% perf-profile.children.cycles-pp.unlink_anon_vmas
0.25 ± 5% -0.2 0.03 ± 70% perf-profile.children.cycles-pp.vm_area_alloc
0.42 ± 6% -0.2 0.26 ± 3% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.16 ± 5% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.__put_anon_vma
0.27 ± 4% -0.1 0.13 ± 3% perf-profile.children.cycles-pp.native_irq_return_iret
0.28 ± 5% -0.1 0.14 ± 7% perf-profile.children.cycles-pp.perf_event_mmap
0.18 ± 4% -0.1 0.07 ± 7% perf-profile.children.cycles-pp.page_add_new_anon_rmap
0.23 ± 3% -0.1 0.14 ± 7% perf-profile.children.cycles-pp.perf_event_mmap_event
0.18 ± 4% -0.1 0.08 ± 5% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
0.16 ± 5% -0.1 0.07 perf-profile.children.cycles-pp.page_remove_rmap
0.13 ± 2% -0.1 0.05 ± 7% perf-profile.children.cycles-pp.get_unmapped_area
0.15 ± 5% -0.0 0.10 ± 9% perf-profile.children.cycles-pp.perf_iterate_sb
0.10 ± 5% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.find_vma
0.39 ± 3% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.rcu_all_qs
0.09 ± 5% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.__perf_sw_event
0.15 ± 4% -0.0 0.13 ± 7% perf-profile.children.cycles-pp.__mem_cgroup_charge
0.07 -0.0 0.06 ± 8% perf-profile.children.cycles-pp.___perf_sw_event
0.12 ± 5% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.__mod_lruvec_state
0.00 +0.1 0.05 perf-profile.children.cycles-pp.perf_output_sample
0.00 +0.1 0.05 perf-profile.children.cycles-pp.memcg_check_events
0.08 ± 4% +0.1 0.13 ± 5% perf-profile.children.cycles-pp.__mod_node_page_state
0.00 +0.1 0.06 ± 8% perf-profile.children.cycles-pp.__get_user_nocheck_8
0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.perf_callchain_user
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.update_load_avg
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.__orc_find
0.19 ± 4% +0.1 0.26 ± 6% perf-profile.children.cycles-pp.__list_del_entry_valid
0.11 ± 4% +0.1 0.18 ± 7% perf-profile.children.cycles-pp.unwind_next_frame
0.00 +0.1 0.08 ± 7% perf-profile.children.cycles-pp.__page_cache_release
0.02 ± 99% +0.1 0.11 ± 4% perf-profile.children.cycles-pp.folio_add_lru
0.00 +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_alloc_and_acct_folio
0.00 +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_alloc_folio
0.00 +0.1 0.09 ± 10% perf-profile.children.cycles-pp.__unwind_start
0.00 +0.1 0.09 ± 11% perf-profile.children.cycles-pp.shmem_write_begin
0.00 +0.1 0.09 ± 11% perf-profile.children.cycles-pp.shmem_getpage_gfp
0.00 +0.1 0.10 ± 6% perf-profile.children.cycles-pp.free_compound_page
0.00 +0.1 0.10 ± 6% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.13 ± 4% +0.1 0.23 ± 7% perf-profile.children.cycles-pp.perf_callchain_kernel
0.13 ± 17% +0.1 0.24 ± 15% perf-profile.children.cycles-pp.cmd_record
0.13 ± 15% +0.1 0.24 ± 16% perf-profile.children.cycles-pp.__libc_start_main
0.13 ± 15% +0.1 0.24 ± 16% perf-profile.children.cycles-pp.main
0.13 ± 15% +0.1 0.24 ± 16% perf-profile.children.cycles-pp.run_builtin
0.04 ± 47% +0.1 0.16 ± 11% perf-profile.children.cycles-pp.generic_perform_write
0.04 ± 47% +0.1 0.16 ± 12% perf-profile.children.cycles-pp.generic_file_write_iter
0.04 ± 47% +0.1 0.16 ± 12% perf-profile.children.cycles-pp.__generic_file_write_iter
0.04 ± 47% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.record__pushfn
0.04 ± 47% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.__libc_write
0.04 ± 47% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.vfs_write
0.04 ± 47% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.perf_mmap__push
0.04 ± 47% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.ksys_write
0.04 ± 47% +0.1 0.17 ± 12% perf-profile.children.cycles-pp.record__mmap_read_evlist
0.13 ± 17% +0.1 0.27 ± 17% perf-profile.children.cycles-pp.__cmd_record
0.15 ± 3% +0.2 0.30 ± 7% perf-profile.children.cycles-pp.get_perf_callchain
0.15 ± 3% +0.2 0.31 ± 6% perf-profile.children.cycles-pp.perf_callchain
0.16 ± 6% +0.2 0.33 ± 7% perf-profile.children.cycles-pp.perf_prepare_sample
0.17 ± 4% +0.2 0.41 ± 6% perf-profile.children.cycles-pp.perf_event_output_forward
0.17 ± 4% +0.2 0.41 ± 6% perf-profile.children.cycles-pp.__perf_event_overflow
0.00 +0.3 0.27 ± 8% perf-profile.children.cycles-pp.__free_one_page
0.18 ± 5% +0.3 0.44 ± 6% perf-profile.children.cycles-pp.perf_tp_event
0.18 ± 5% +0.3 0.46 ± 6% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.19 ± 5% +0.3 0.53 ± 6% perf-profile.children.cycles-pp.update_curr
0.21 ± 3% +0.4 0.62 ± 5% perf-profile.children.cycles-pp.task_tick_fair
0.00 +0.4 0.42 ± 8% perf-profile.children.cycles-pp.check_new_pages
0.23 ± 3% +0.5 0.72 ± 5% perf-profile.children.cycles-pp.scheduler_tick
0.25 ± 4% +0.6 0.83 ± 5% perf-profile.children.cycles-pp.update_process_times
0.25 ± 4% +0.6 0.84 ± 5% perf-profile.children.cycles-pp.tick_sched_handle
0.26 ± 4% +0.6 0.87 ± 5% perf-profile.children.cycles-pp.tick_sched_timer
0.29 ± 3% +0.7 0.98 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.34 ± 4% +0.7 1.09 ± 4% perf-profile.children.cycles-pp.hrtimer_interrupt
0.36 ± 4% +0.8 1.12 ± 4% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.38 ± 4% +0.8 1.18 ± 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.44 ± 5% +1.1 1.51 ± 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.00 +1.1 1.14 ± 2% perf-profile.children.cycles-pp.free_pcp_prepare
3.60 ± 3% +1.5 5.08 ± 7% perf-profile.children.cycles-pp.unmap_vmas
3.52 ± 3% +1.6 5.07 ± 7% perf-profile.children.cycles-pp.unmap_page_range
3.43 ± 3% +1.6 5.05 ± 7% perf-profile.children.cycles-pp.zap_pmd_range
0.74 ± 3% +1.7 2.42 ± 2% perf-profile.children.cycles-pp.__cond_resched
0.79 ± 4% +2.9 3.67 ± 2% perf-profile.children.cycles-pp.__might_resched
0.73 ± 3% +3.9 4.63 ± 8% perf-profile.children.cycles-pp.pte_alloc_one
0.32 ± 5% +4.5 4.84 ± 7% perf-profile.children.cycles-pp.vma_alloc_folio
0.27 ± 5% +4.5 4.82 ± 7% perf-profile.children.cycles-pp.__folio_alloc
0.00 +4.8 4.82 ± 7% perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page
0.00 +4.9 4.88 ± 7% perf-profile.children.cycles-pp.zap_huge_pmd
0.75 ± 4% +8.7 9.41 ± 7% perf-profile.children.cycles-pp.__alloc_pages
0.16 ± 5% +8.8 9.00 ± 8% perf-profile.children.cycles-pp.rmqueue
0.00 +8.9 8.86 ± 8% perf-profile.children.cycles-pp.rmqueue_bulk
0.42 ± 5% +8.9 9.28 ± 7% perf-profile.children.cycles-pp.get_page_from_freelist
0.00 +13.0 13.02 ± 8% perf-profile.children.cycles-pp.free_pcppages_bulk
0.00 +14.4 14.36 ± 7% perf-profile.children.cycles-pp.free_unref_page
0.12 ± 3% +20.9 21.00 ± 8% perf-profile.children.cycles-pp._raw_spin_lock
0.18 ± 3% +65.9 66.08 ± 2% perf-profile.children.cycles-pp.clear_page_erms
0.00 +73.5 73.51 ± 2% perf-profile.children.cycles-pp.clear_huge_page
0.00 +78.6 78.58 perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
2.65 ± 3% +81.0 83.62 perf-profile.children.cycles-pp.asm_exc_page_fault
2.61 ± 3% +81.0 83.59 perf-profile.children.cycles-pp.exc_page_fault
2.60 ± 3% +81.0 83.59 perf-profile.children.cycles-pp.do_user_addr_fault
2.39 ± 3% +81.1 83.53 perf-profile.children.cycles-pp.handle_mm_fault
2.27 ± 3% +81.2 83.46 perf-profile.children.cycles-pp.__handle_mm_fault
82.87 -62.0 20.90 ± 8% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
2.98 ± 4% -2.9 0.03 ± 70% perf-profile.self.cycles-pp.native_flush_tlb_one_user
0.72 ± 3% -0.6 0.08 ± 10% perf-profile.self.cycles-pp.zap_pmd_range
0.50 ± 5% -0.5 0.03 ± 70% perf-profile.self.cycles-pp.page_counter_uncharge
0.41 ± 4% -0.3 0.06 ± 7% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.27 ± 4% -0.1 0.13 ± 3% perf-profile.self.cycles-pp.native_irq_return_iret
0.20 ± 6% -0.1 0.08 ± 8% perf-profile.self.cycles-pp.kmem_cache_free
0.22 ± 3% -0.1 0.14 ± 3% perf-profile.self.cycles-pp.rcu_all_qs
0.08 ± 5% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.try_charge_memcg
0.02 ±141% +0.1 0.07 ± 5% perf-profile.self.cycles-pp.unwind_next_frame
0.08 ± 6% +0.1 0.13 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state
0.00 +0.1 0.06 ± 8% perf-profile.self.cycles-pp.__do_huge_pmd_anonymous_page
0.00 +0.1 0.06 ± 8% perf-profile.self.cycles-pp.page_counter_try_charge
0.00 +0.1 0.06 ± 9% perf-profile.self.cycles-pp.__orc_find
0.19 ± 3% +0.1 0.26 ± 8% perf-profile.self.cycles-pp.__list_del_entry_valid
0.08 ± 7% +0.1 0.19 ± 8% perf-profile.self.cycles-pp.get_page_from_freelist
0.00 +0.3 0.25 ± 8% perf-profile.self.cycles-pp.__free_one_page
0.00 +0.4 0.42 ± 8% perf-profile.self.cycles-pp.check_new_pages
0.00 +1.1 1.10 ± 2% perf-profile.self.cycles-pp.free_pcp_prepare
0.42 ± 4% +1.2 1.59 ± 2% perf-profile.self.cycles-pp.__cond_resched
0.00 +2.5 2.47 ± 2% perf-profile.self.cycles-pp.clear_huge_page
0.70 ± 4% +2.7 3.36 ± 2% perf-profile.self.cycles-pp.__might_resched
0.18 ± 4% +65.0 65.14 ± 2% perf-profile.self.cycles-pp.clear_page_erms
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@...el.com>
| Link: https://lore.kernel.org/r/202210181535.7144dd15-yujie.liu@intel.com
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
View attachment "config-6.0.0-rc3-00017-gf35b5d7d676e" of type "text/plain" (163956 bytes)
View attachment "job-script" of type "text/plain" (8099 bytes)
View attachment "job.yaml" of type "text/plain" (5611 bytes)
View attachment "reproduce" of type "text/plain" (348 bytes)
Powered by blists - more mailing lists