[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202408221014.cf0a2408-oliver.sang@intel.com>
Date: Thu, 22 Aug 2024 13:30:36 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Zhang Qiao <zhangqiao22@...wei.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
<x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
<ying.huang@...el.com>, <feng.tang@...el.com>, <fengwei.yin@...el.com>,
<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, <oliver.sang@...el.com>
Subject: [tip:sched/core] [sched] c40dd90ac0:
stress-ng.mlockmany.ops_per_sec 6.9% improvement
Hello,
Chen Yu (Cced) helped review this result and gave us below information:
Firstly, it seems that this commit has increased the context switch ratio per
the test. And the overloaded stress-ng.mlockmany prefers more context switch
according to the test score.
Secondly, the mlockmany is to stress the forking and exiting, and this commit
has fixed the vruntime for task forking, so I think this commit is firstly a
fix, and the subsequent performance improvement is a 'side-effect'.
thanks a lot, Chen Yu!
below is full report FYI.
kernel test robot noticed a 6.9% improvement of stress-ng.mlockmany.ops_per_sec on:
commit: c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c ("sched: Initialize the vruntime of a new task when it is first enqueued")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: mlockmany
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240822/202408221014.cf0a2408-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mlockmany/stress-ng/60s
commit:
fe7a11c78d ("sched/core: Fix unbalance set_rq_online/offline() in sched_cpu_deactivate()")
c40dd90ac0 ("sched: Initialize the vruntime of a new task when it is first enqueued")
fe7a11c78d2a9bdb c40dd90ac045fa1fdf6acc5bf91
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.33 ± 6% +0.1 0.39 ± 4% mpstat.cpu.all.soft%
24850 +12.5% 27952 vmstat.system.cs
321532 ± 3% -5.1% 305193 ± 2% proc-vmstat.nr_mlock
1094291 -1.5% 1077939 proc-vmstat.nr_unevictable
1094289 -1.5% 1077939 proc-vmstat.nr_zone_unevictable
1203 ± 7% +14.6% 1378 ± 5% sched_debug.cfs_rq:/.util_avg.max
124070 ± 4% +8.4% 134473 ± 2% sched_debug.cpu.curr->pid.avg
11329 ± 3% +10.4% 12512 ± 4% sched_debug.cpu.nr_switches.min
269151 +6.9% 287597 stress-ng.mlockmany.ops
4482 +6.9% 4789 stress-ng.mlockmany.ops_per_sec
1013700 +13.7% 1152838 stress-ng.time.involuntary_context_switches
321785 +6.0% 341166 stress-ng.time.voluntary_context_switches
0.24 +0.0 0.26 ± 3% perf-stat.i.branch-miss-rate%
22351186 +8.7% 24290613 ± 3% perf-stat.i.branch-misses
78.51 -0.9 77.60 perf-stat.i.cache-miss-rate%
25668 +11.8% 28695 perf-stat.i.context-switches
8.46 ± 2% +8.5% 9.18 ± 2% perf-stat.i.metric.K/sec
271097 ± 2% +8.5% 294019 ± 2% perf-stat.i.minor-faults
271098 ± 2% +8.5% 294022 ± 2% perf-stat.i.page-faults
0.20 ± 44% +0.1 0.26 ± 3% perf-stat.overall.branch-miss-rate%
18381669 ± 44% +30.8% 24045151 ± 3% perf-stat.ps.branch-misses
20985 ± 44% +34.2% 28161 perf-stat.ps.context-switches
1289 ± 44% +23.5% 1591 perf-stat.ps.cpu-migrations
221777 ± 44% +30.5% 289386 ± 2% perf-stat.ps.minor-faults
221778 ± 44% +30.5% 289389 ± 2% perf-stat.ps.page-faults
2.409e+12 ± 44% +21.0% 2.915e+12 perf-stat.total.instructions
23.29 -1.1 22.21 ± 2% perf-profile.calltrace.cycles-pp.mlock_drain_local.populate_vma_page_range.__mm_populate.do_mlock.__x64_sys_mlock
23.14 -1.1 22.06 ± 2% perf-profile.calltrace.cycles-pp.mlock_folio_batch.mlock_drain_local.populate_vma_page_range.__mm_populate.do_mlock
22.98 -1.1 21.90 ± 2% perf-profile.calltrace.cycles-pp.__mlock_new_folio.mlock_folio_batch.mlock_drain_local.populate_vma_page_range.__mm_populate
22.37 -1.1 21.30 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_lruvec_lock_irq.__mlock_new_folio.mlock_folio_batch.mlock_drain_local
22.37 -1.1 21.31 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irq.__mlock_new_folio.mlock_folio_batch.mlock_drain_local.populate_vma_page_range
22.27 -1.1 21.21 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_lruvec_lock_irq.__mlock_new_folio.mlock_folio_batch
46.09 -0.7 45.36 perf-profile.calltrace.cycles-pp.__x64_sys_mlock.do_syscall_64.entry_SYSCALL_64_after_hwframe.mlock
46.09 -0.7 45.36 perf-profile.calltrace.cycles-pp.do_mlock.__x64_sys_mlock.do_syscall_64.entry_SYSCALL_64_after_hwframe.mlock
46.30 -0.7 45.62 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.mlock
46.31 -0.7 45.62 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.mlock
46.38 -0.7 45.70 perf-profile.calltrace.cycles-pp.mlock
0.56 +0.0 0.60 perf-profile.calltrace.cycles-pp.copy_present_ptes.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
0.65 ± 2% +0.1 0.70 ± 2% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
0.67 ± 3% +0.1 0.72 ± 2% perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.67 ± 2% +0.1 0.72 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.74 ± 3% +0.1 0.80 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.76 ± 3% +0.1 0.83 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.86 ± 2% +0.1 0.93 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
0.86 ± 2% +0.1 0.93 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.90 ± 2% +0.1 0.98 perf-profile.calltrace.cycles-pp.asm_exc_page_fault
1.15 +0.1 1.23 ± 2% perf-profile.calltrace.cycles-pp.anon_vma_interval_tree_insert.anon_vma_clone.anon_vma_fork.dup_mmap.dup_mm
0.88 ± 3% +0.1 0.97 perf-profile.calltrace.cycles-pp.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap.dup_mm
1.17 +0.1 1.26 perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.exit_mmap.mmput.exit_mm
1.04 ± 4% +0.1 1.15 perf-profile.calltrace.cycles-pp.copy_p4d_range.copy_page_range.dup_mmap.dup_mm.copy_process
1.06 ± 3% +0.1 1.17 perf-profile.calltrace.cycles-pp.copy_page_range.dup_mmap.dup_mm.copy_process.kernel_clone
1.84 +0.1 1.98 ± 2% perf-profile.calltrace.cycles-pp.anon_vma_clone.anon_vma_fork.dup_mmap.dup_mm.copy_process
1.89 +0.1 2.02 ± 2% perf-profile.calltrace.cycles-pp.free_pgtables.exit_mmap.mmput.exit_mm.do_exit
2.30 ± 2% +0.2 2.46 ± 2% perf-profile.calltrace.cycles-pp.anon_vma_fork.dup_mmap.dup_mm.copy_process.kernel_clone
2.23 +0.2 2.40 ± 2% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
2.29 +0.2 2.47 ± 2% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
2.26 +0.2 2.44 ± 2% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
2.46 +0.2 2.66 ± 2% perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
0.26 ±100% +0.3 0.54 perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
5.10 ± 2% +0.4 5.55 ± 2% perf-profile.calltrace.cycles-pp.dup_mmap.dup_mm.copy_process.kernel_clone.__do_sys_clone
5.98 ± 2% +0.5 6.45 ± 2% perf-profile.calltrace.cycles-pp._Fork
5.32 ± 2% +0.5 5.79 ± 2% perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
5.78 ± 2% +0.5 6.26 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
5.78 ± 2% +0.5 6.26 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
5.62 ± 2% +0.5 6.11 ± 2% perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.73 ± 2% +0.5 6.23 ± 2% perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
5.73 ± 2% +0.5 6.23 ± 2% perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
7.30 ± 2% +0.5 7.83 ± 2% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
7.42 ± 2% +0.5 7.96 ± 2% perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.41 ± 2% +0.5 7.95 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.41 ± 2% +0.5 7.95 ± 2% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64
7.41 ± 2% +0.5 7.95 ± 2% perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.10 +0.5 7.65 ± 2% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.exit_mm.do_exit.do_group_exit
7.43 ± 2% +0.5 7.97 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.43 ± 2% +0.5 7.97 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
7.12 ± 2% +0.5 7.66 ± 2% perf-profile.calltrace.cycles-pp.mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
23.15 -1.0 22.12 ± 2% perf-profile.children.cycles-pp.__mlock_new_folio
46.09 -0.7 45.36 perf-profile.children.cycles-pp.__x64_sys_mlock
46.09 -0.7 45.36 perf-profile.children.cycles-pp.do_mlock
46.40 -0.7 45.72 perf-profile.children.cycles-pp.mlock
96.89 -0.2 96.67 perf-profile.children.cycles-pp.do_syscall_64
96.90 -0.2 96.68 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.13 ± 4% -0.0 0.11 perf-profile.children.cycles-pp.schedule_tail
0.08 ± 5% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.__wp_page_copy_user
0.14 ± 3% +0.0 0.15 perf-profile.children.cycles-pp.obj_cgroup_charge
0.15 ± 3% +0.0 0.16 perf-profile.children.cycles-pp.mas_find
0.23 ± 6% +0.0 0.25 ± 4% perf-profile.children.cycles-pp.___slab_alloc
0.25 ± 3% +0.0 0.28 ± 2% perf-profile.children.cycles-pp.__rb_insert_augmented
0.34 ± 2% +0.0 0.37 ± 2% perf-profile.children.cycles-pp.mod_objcg_state
0.45 +0.0 0.48 ± 2% perf-profile.children.cycles-pp.__memcg_slab_free_hook
0.24 ± 5% +0.0 0.28 ± 6% perf-profile.children.cycles-pp.wp_page_copy
0.32 +0.0 0.36 perf-profile.children.cycles-pp.up_write
0.56 ± 2% +0.0 0.61 perf-profile.children.cycles-pp.copy_present_ptes
0.00 +0.1 0.05 perf-profile.children.cycles-pp.__put_user_4
0.86 +0.1 0.92 perf-profile.children.cycles-pp.lru_add_drain
0.85 +0.1 0.91 perf-profile.children.cycles-pp.folio_batch_move_lru
0.85 +0.1 0.92 perf-profile.children.cycles-pp.lru_add_drain_cpu
0.99 ± 3% +0.1 1.06 ± 3% perf-profile.children.cycles-pp.next_uptodate_folio
0.20 +0.1 0.27 perf-profile.children.cycles-pp.do_wp_page
0.36 ± 2% +0.1 0.43 ± 3% perf-profile.children.cycles-pp.__libc_fork
0.92 ± 2% +0.1 0.99 perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
1.05 +0.1 1.12 perf-profile.children.cycles-pp.down_write
1.28 ± 3% +0.1 1.35 ± 3% perf-profile.children.cycles-pp.filemap_map_pages
1.16 +0.1 1.24 ± 2% perf-profile.children.cycles-pp.anon_vma_interval_tree_insert
0.88 ± 3% +0.1 0.98 perf-profile.children.cycles-pp.copy_pte_range
1.18 +0.1 1.28 ± 2% perf-profile.children.cycles-pp.unlink_anon_vmas
0.59 ± 6% +0.1 0.68 ± 5% perf-profile.children.cycles-pp.__cond_resched
1.05 ± 4% +0.1 1.16 ± 2% perf-profile.children.cycles-pp.copy_p4d_range
1.06 ± 3% +0.1 1.18 ± 2% perf-profile.children.cycles-pp.copy_page_range
0.67 ± 7% +0.1 0.78 ± 5% perf-profile.children.cycles-pp.__schedule
1.85 +0.1 1.98 ± 2% perf-profile.children.cycles-pp.anon_vma_clone
1.91 +0.1 2.04 ± 2% perf-profile.children.cycles-pp.free_pgtables
2.30 ± 2% +0.2 2.46 ± 2% perf-profile.children.cycles-pp.anon_vma_fork
2.17 ± 2% +0.2 2.37 ± 2% perf-profile.children.cycles-pp.do_user_addr_fault
2.18 ± 2% +0.2 2.38 ± 2% perf-profile.children.cycles-pp.exc_page_fault
2.30 ± 2% +0.2 2.51 ± 2% perf-profile.children.cycles-pp.asm_exc_page_fault
5.11 ± 2% +0.4 5.56 ± 2% perf-profile.children.cycles-pp.dup_mmap
5.32 ± 2% +0.5 5.79 ± 2% perf-profile.children.cycles-pp.dup_mm
5.99 ± 2% +0.5 6.46 ± 2% perf-profile.children.cycles-pp._Fork
5.63 ± 2% +0.5 6.12 ± 2% perf-profile.children.cycles-pp.copy_process
5.73 ± 2% +0.5 6.23 ± 2% perf-profile.children.cycles-pp.__do_sys_clone
5.73 ± 2% +0.5 6.23 ± 2% perf-profile.children.cycles-pp.kernel_clone
7.30 ± 2% +0.5 7.84 ± 2% perf-profile.children.cycles-pp.exit_mm
7.13 ± 2% +0.5 7.67 ± 2% perf-profile.children.cycles-pp.mmput
7.11 +0.5 7.66 ± 2% perf-profile.children.cycles-pp.exit_mmap
7.55 ± 2% +0.6 8.10 ± 2% perf-profile.children.cycles-pp.__x64_sys_exit_group
7.55 ± 2% +0.6 8.10 ± 2% perf-profile.children.cycles-pp.do_exit
7.55 ± 2% +0.6 8.10 ± 2% perf-profile.children.cycles-pp.do_group_exit
7.57 ± 2% +0.6 8.12 ± 2% perf-profile.children.cycles-pp.x64_sys_call
0.05 +0.0 0.06 perf-profile.self.cycles-pp.sync_regs
0.16 ± 2% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.copy_present_ptes
0.29 ± 2% +0.0 0.31 ± 3% perf-profile.self.cycles-pp.__memcg_slab_free_hook
0.24 ± 2% +0.0 0.26 ± 2% perf-profile.self.cycles-pp.__rb_insert_augmented
0.30 +0.0 0.33 ± 2% perf-profile.self.cycles-pp.up_write
0.55 ± 3% +0.0 0.58 perf-profile.self.cycles-pp.zap_pte_range
0.50 +0.0 0.54 ± 2% perf-profile.self.cycles-pp.down_write
0.94 ± 3% +0.1 1.00 ± 3% perf-profile.self.cycles-pp.next_uptodate_folio
1.15 +0.1 1.23 ± 2% perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists