[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202409161559.af0a1b99-oliver.sang@intel.com>
Date: Mon, 16 Sep 2024 16:11:48 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, Linux Memory Management List
<linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>, "Liam R.
Howlett" <Liam.Howlett@...cle.com>, Mark Brown <broonie@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>, Bert Karwatzki <spasswolf@....de>, Jeff Xu
<jeffxu@...omium.org>, Jiri Olsa <olsajiri@...il.com>, Kees Cook
<kees@...nel.org>, Lorenzo Stoakes <lstoakes@...il.com>, Matthew Wilcox
<willy@...radead.org>, "Paul E. McKenney" <paulmck@...nel.org>, Paul Moore
<paul@...l-moore.com>, Sidhartha Kumar <sidhartha.kumar@...cle.com>, "Suren
Baghdasaryan" <surenb@...gle.com>, <linux-kernel@...r.kernel.org>,
<ying.huang@...el.com>, <feng.tang@...el.com>, <fengwei.yin@...el.com>,
<oliver.sang@...el.com>
Subject: [linux-next:master] [mm] cc8cb3697a: stress-ng.pkey.ops_per_sec
4.4% improvement
Hello,
kernel test robot noticed a 4.4% improvement of stress-ng.pkey.ops_per_sec on:
commit: cc8cb3697a8d8eabe1fb9acb8768b11c1ab607d8 ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: pkey
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240916/202409161559.af0a1b99-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pkey/stress-ng/60s
commit:
65e0aa64df ("mm: introduce commit_merge(), abstracting final commit of merge")
cc8cb3697a ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()")
65e0aa64df916861 cc8cb3697a8d8eabe1fb9acb876
---------------- ---------------------------
%stddev %change %stddev
\ | \
159916 ± 5% +14.9% 183809 ± 10% meminfo.DirectMap4k
15.42 ± 23% +46.5% 22.58 ± 17% sched_debug.cpu.nr_uninterruptible.max
2.158e+08 +4.4% 2.253e+08 stress-ng.pkey.ops
3596484 +4.4% 3755565 stress-ng.pkey.ops_per_sec
196.30 +4.9% 205.86 stress-ng.time.user_time
25782400 +3.4% 26666903 proc-vmstat.numa_hit
25707363 +3.5% 26600006 proc-vmstat.numa_local
44223158 +3.4% 45721027 proc-vmstat.pgalloc_normal
39763569 +3.5% 41151044 proc-vmstat.pgfree
3.568e+10 +1.4% 3.619e+10 perf-stat.i.branch-instructions
87058419 ± 2% +3.1% 89795461 perf-stat.i.branch-misses
1.482e+08 +2.7% 1.521e+08 perf-stat.i.cache-references
1854 -2.2% 1813 perf-stat.i.cycles-between-cache-misses
1.68e+11 +1.1% 1.699e+11 perf-stat.i.instructions
0.64 +1.8% 0.65 perf-stat.overall.MPKI
1812 -2.5% 1766 perf-stat.overall.cycles-between-cache-misses
1.045e+08 +2.6% 1.073e+08 perf-stat.ps.cache-misses
1.446e+08 +2.8% 1.486e+08 perf-stat.ps.cache-references
25.66 ±116% -96.6% 0.86 ±168% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
9.35 ± 40% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
9.63 ± 36% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
3.87 ± 38% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
10.81 ± 36% -76.2% 2.57 ±142% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
3.74 ± 55% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
2.32 ± 34% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
1.32 ±104% -80.1% 0.26 ±221% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
19.81 ±188% -99.3% 0.14 ±142% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.39 ± 57% -81.6% 0.07 ±153% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
180.87 ±203% -99.1% 1.55 ±153% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.36 ±108% -96.5% 0.01 ±187% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
1.44 ± 19% -85.8% 0.20 ±171% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
40.94 ±115% -99.8% 0.10 ±143% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
112.73 ±118% -98.9% 1.19 ±142% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
335.83 ± 29% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
301.19 ± 35% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0
22.34 ± 98% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
473.14 ± 21% -76.2% 112.54 ±144% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
6.84 ± 51% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
7.07 ± 72% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
0.42 ±147% -98.1% 0.01 ±141% perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
2373 ± 40% -78.6% 507.50 ±152% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
1.70 ±111% -96.7% 0.06 ±212% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
1309 ± 77% -99.6% 5.06 ±165% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2745 ± 25% -81.5% 507.97 ±152% perf-sched.total_sch_delay.max.ms
10044 ± 4% -74.3% 2576 ±141% perf-sched.total_wait_and_delay.count.ms
6234 ± 21% -77.2% 1421 ±141% perf-sched.total_wait_and_delay.max.ms
18.71 ± 40% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
19.26 ± 36% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
21.62 ± 36% -76.2% 5.15 ±142% perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
885.96 ± 42% -79.1% 185.28 ±142% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
144.50 ± 24% -86.4% 19.67 ±145% perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
131.83 ± 9% -73.7% 34.67 ±141% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
159.83 ± 8% -75.8% 38.67 ±144% perf-sched.wait_and_delay.count.__cond_resched.change_pmd_range.isra.0.change_pud_range
227.83 ± 9% -77.0% 52.33 ±141% perf-sched.wait_and_delay.count.__cond_resched.change_pud_range.isra.0.change_protection_range
75.00 ± 8% -71.6% 21.33 ±143% perf-sched.wait_and_delay.count.__cond_resched.down_write.__x64_sys_pkey_free.do_syscall_64.entry_SYSCALL_64_after_hwframe
82.00 ± 9% -76.4% 19.33 ±141% perf-sched.wait_and_delay.count.__cond_resched.down_write.anon_vma_clone.__split_vma.vma_modify
412.67 ± 7% -82.6% 71.83 ±141% perf-sched.wait_and_delay.count.__cond_resched.down_write.mprotect_fixup.do_mprotect_pkey.__x64_sys_pkey_mprotect
125.83 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
86.83 ± 14% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down_write.vma_merge.constprop.0
225.33 ± 7% -76.1% 53.83 ±142% perf-sched.wait_and_delay.count.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
314.67 ± 31% -87.1% 40.50 ±142% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
118.17 ± 12% -80.0% 23.67 ±143% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
206.50 ± 8% -77.2% 47.17 ±141% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vma_modify
76.33 ± 23% -90.6% 7.17 ±223% perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
45.33 ± 21% -83.1% 7.67 ±148% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
626.00 ± 66% -92.6% 46.17 ±142% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
10.33 ± 14% -83.9% 1.67 ±223% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
54.17 ± 27% -70.2% 16.17 ±141% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1976 ± 7% -77.3% 447.67 ±141% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1760 ± 10% -74.8% 443.33 ±147% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
195.50 ± 9% -75.9% 47.17 ±141% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
671.66 ± 29% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
602.38 ± 35% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0
946.28 ± 21% -76.2% 225.08 ±144% perf-sched.wait_and_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
4225 ± 39% -75.8% 1022 ±141% perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
2837 ± 31% -88.2% 334.64 ±223% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
4535 ± 33% -74.1% 1173 ±143% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
25.66 ±116% -96.6% 0.86 ±168% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
9.36 ± 40% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
9.63 ± 36% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
3.87 ± 38% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
10.81 ± 36% -76.2% 2.57 ±142% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
2.32 ± 34% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
286.97 ±115% -99.8% 0.71 ±182% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.39 ± 57% -81.6% 0.07 ±153% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
705.09 ± 56% -73.9% 183.73 ±142% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
112.73 ±118% -98.9% 1.19 ±142% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
335.83 ± 29% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
301.19 ± 35% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_merge.constprop.0
22.34 ± 98% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
473.14 ± 21% -76.2% 112.54 ±144% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
7.07 ± 72% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
835.83 ±107% -99.8% 1.31 ±200% perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
1536 ± 83% -77.9% 339.71 ±141% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
2836 ± 31% -88.2% 334.51 ±223% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists