[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202510231459.ad690ecd-lkp@intel.com>
Date: Thu, 23 Oct 2025 15:26:05 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, Chen Yu <yu.c.chen@...el.com>,
Tim Chen <tim.c.chen@...ux.intel.com>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>, <aubrey.li@...ux.intel.com>, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, K Prateek Nayak
<kprateek.nayak@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, "Steven
Rostedt" <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, "Madadi Vineeth
Reddy" <vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, "Shrikanth
Hegde" <sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>,
"Yangyu Chen" <cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Vern
Hao <vernhao@...cent.com>, Len Brown <len.brown@...el.com>, Aubrey Li
<aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>, Chen Yu
<yu.chen.surf@...il.com>, Libo Chen <libo.chen@...cle.com>, Adam Li
<adamli@...amperecomputing.com>, Tim Chen <tim.c.chen@...el.com>,
<oliver.sang@...el.com>
Subject: Re: [PATCH 01/19] sched/fair: Add infrastructure for cache-aware
load balancing
Hello,
kernel test robot noticed a 5.1% improvement of will-it-scale.per_thread_ops on:
commit: ddf7df94672b42db9a86b3225cf9ebcfdfefc506 ("[PATCH 01/19] sched/fair: Add infrastructure for cache-aware load balancing")
url: https://github.com/intel-lab-lkp/linux/commits/Tim-Chen/sched-fair-Add-infrastructure-for-cache-aware-load-balancing/20251012-022248
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 45b7f780739a3145aeef24d2dfa02517a6c82ed6
patch link: https://lore.kernel.org/all/865b852e3fdef6561c9e0a5be9a94aec8a68cdea.1760206683.git.tim.c.chen@linux.intel.com/
patch subject: [PATCH 01/19] sched/fair: Add infrastructure for cache-aware load balancing
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_task: 100%
mode: thread
test: mmap1
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251023/202510231459.ad690ecd-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-14/performance/x86_64-rhel-9.4/thread/100%/debian-13-x86_64-20250902.cgz/lkp-icl-2sp7/mmap1/will-it-scale
commit:
45b7f78073 ("sched: Fix some typos in include/linux/preempt.h")
ddf7df9467 ("sched/fair: Add infrastructure for cache-aware load balancing")
45b7f780739a3145 ddf7df94672b42db9a86b3225cf
---------------- ---------------------------
%stddev %change %stddev
\ | \
12.30 ± 4% +1.2 13.54 ± 2% turbostat.C6%
16510 ± 2% +11.0% 18333 ± 5% perf-c2c.HITM.local
14007 ± 2% +11.9% 15670 ± 5% perf-c2c.HITM.remote
30518 ± 2% +11.4% 34003 ± 5% perf-c2c.HITM.total
4216 ±117% -82.9% 720.83 ± 87% sched_debug.cfs_rq:/.load_avg.max
230030 ± 6% +21.9% 280349 ± 3% sched_debug.cpu.nr_switches.min
32696287 +100.0% 65398751 sched_debug.sysctl_sched.sysctl_sched_features
71075 +5.1% 74710 will-it-scale.64.threads
1109 +5.1% 1166 will-it-scale.per_thread_ops
71075 +5.1% 74710 will-it-scale.workload
20012272 +1.8% 20374679 perf-stat.i.branch-misses
20662368 +4.4% 21568525 perf-stat.i.cache-references
181255 ± 6% +10.5% 200376 ± 2% perf-stat.i.context-switches
177.39 +5.8% 187.65 perf-stat.i.cpu-migrations
22226 ± 3% -5.9% 20924 ± 2% perf-stat.i.cycles-between-cache-misses
2.83 ± 6% +10.5% 3.13 ± 2% perf-stat.i.metric.K/sec
0.26 +0.0 0.27 perf-stat.overall.branch-miss-rate%
1.609e+08 -5.6% 1.518e+08 perf-stat.overall.path-length
19825311 +1.9% 20209803 perf-stat.ps.branch-misses
20712472 +4.4% 21614327 perf-stat.ps.cache-references
180352 ± 6% +10.5% 199276 ± 2% perf-stat.ps.context-switches
177.05 +5.8% 187.32 perf-stat.ps.cpu-migrations
47.47 -0.1 47.36 perf-profile.calltrace.cycles-pp.osq_lock.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap
48.49 -0.1 48.40 perf-profile.calltrace.cycles-pp.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
48.39 -0.1 48.30 perf-profile.calltrace.cycles-pp.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
47.36 -0.1 47.27 perf-profile.calltrace.cycles-pp.osq_lock.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64
48.44 -0.1 48.35 perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64
48.34 -0.1 48.25 perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
49.70 -0.1 49.62 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
49.70 -0.1 49.62 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
49.71 -0.1 49.64 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
49.73 -0.1 49.66 perf-profile.calltrace.cycles-pp.__munmap
49.71 -0.1 49.64 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
49.11 -0.1 49.06 perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.52 +0.0 0.54 perf-profile.calltrace.cycles-pp.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
0.54 ± 2% +0.0 0.56 perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.67 ± 3% +0.1 0.74 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
0.68 ± 3% +0.1 0.74 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.71 ± 3% +0.1 0.78 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
0.99 ± 3% +0.1 1.10 ± 2% perf-profile.calltrace.cycles-pp.common_startup_64
0.97 ± 3% +0.1 1.08 ± 2% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
0.97 ± 3% +0.1 1.08 ± 2% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
0.97 ± 3% +0.1 1.08 ± 2% perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
94.85 -0.2 94.65 perf-profile.children.cycles-pp.osq_lock
96.81 -0.2 96.64 perf-profile.children.cycles-pp.rwsem_down_write_slowpath
96.87 -0.2 96.70 perf-profile.children.cycles-pp.down_write_killable
98.91 -0.1 98.80 perf-profile.children.cycles-pp.do_syscall_64
98.91 -0.1 98.80 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
49.70 -0.1 49.62 perf-profile.children.cycles-pp.__x64_sys_munmap
49.70 -0.1 49.62 perf-profile.children.cycles-pp.__vm_munmap
49.73 -0.1 49.66 perf-profile.children.cycles-pp.__munmap
0.05 +0.0 0.06 perf-profile.children.cycles-pp.wake_q_add
0.23 +0.0 0.24 perf-profile.children.cycles-pp.kmem_cache_free
0.15 ± 2% +0.0 0.16 perf-profile.children.cycles-pp.anon_vma_clone
0.07 +0.0 0.08 ± 4% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.07 ± 5% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.pick_next_task_fair
0.08 ± 4% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.__pick_next_task
0.07 +0.0 0.08 ± 5% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
0.22 +0.0 0.24 ± 3% perf-profile.children.cycles-pp.vma_expand
0.09 ± 4% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.dequeue_task_fair
0.10 ± 3% +0.0 0.12 ± 3% perf-profile.children.cycles-pp.schedule_idle
0.08 ± 4% +0.0 0.10 ± 3% perf-profile.children.cycles-pp.dequeue_entity
0.09 ± 4% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.try_to_block_task
0.45 +0.0 0.47 perf-profile.children.cycles-pp.__split_vma
0.08 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.dequeue_entities
0.16 ± 2% +0.0 0.18 ± 4% perf-profile.children.cycles-pp.commit_merge
0.16 +0.0 0.18 ± 5% perf-profile.children.cycles-pp.vma_complete
0.12 ± 7% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.25 +0.0 0.27 ± 3% perf-profile.children.cycles-pp.vma_merge_new_range
0.54 +0.0 0.56 perf-profile.children.cycles-pp.do_mmap
0.08 ± 5% +0.0 0.11 ± 4% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.05 ± 7% +0.0 0.08 ± 4% perf-profile.children.cycles-pp.update_curr
0.04 ± 44% +0.0 0.08 ± 6% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.41 +0.0 0.44 ± 2% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.39 +0.0 0.43 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler
0.18 ± 4% +0.0 0.21 ± 3% perf-profile.children.cycles-pp.schedule
0.17 ± 3% +0.0 0.21 ± 2% perf-profile.children.cycles-pp.schedule_preempt_disabled
0.18 ± 2% +0.0 0.22 ± 2% perf-profile.children.cycles-pp.try_to_wake_up
0.18 ± 4% +0.0 0.22 ± 3% perf-profile.children.cycles-pp.wake_up_q
0.35 +0.0 0.39 ± 2% perf-profile.children.cycles-pp.update_process_times
0.24 ± 3% +0.0 0.28 ± 3% perf-profile.children.cycles-pp.sched_tick
0.25 ± 3% +0.0 0.29 ± 3% perf-profile.children.cycles-pp.rwsem_wake
0.07 +0.0 0.12 ± 3% perf-profile.children.cycles-pp._raw_spin_lock
0.28 ± 2% +0.0 0.33 ± 2% perf-profile.children.cycles-pp.__schedule
0.18 ± 3% +0.1 0.23 ± 5% perf-profile.children.cycles-pp.task_tick_fair
0.00 +0.1 0.06 perf-profile.children.cycles-pp.update_se
0.69 ± 3% +0.1 0.76 ± 2% perf-profile.children.cycles-pp.cpuidle_enter
0.69 ± 3% +0.1 0.76 ± 2% perf-profile.children.cycles-pp.cpuidle_enter_state
0.72 ± 3% +0.1 0.79 ± 2% perf-profile.children.cycles-pp.cpuidle_idle_call
0.36 ± 3% +0.1 0.44 ± 2% perf-profile.children.cycles-pp.intel_idle_irq
0.99 ± 3% +0.1 1.10 ± 2% perf-profile.children.cycles-pp.common_startup_64
0.99 ± 3% +0.1 1.10 ± 2% perf-profile.children.cycles-pp.cpu_startup_entry
0.99 ± 3% +0.1 1.10 ± 2% perf-profile.children.cycles-pp.do_idle
0.97 ± 3% +0.1 1.08 ± 2% perf-profile.children.cycles-pp.start_secondary
94.33 -0.2 94.10 perf-profile.self.cycles-pp.osq_lock
0.32 -0.0 0.29 ± 2% perf-profile.self.cycles-pp.rwsem_down_write_slowpath
0.07 ± 5% -0.0 0.05 perf-profile.self.cycles-pp.vms_complete_munmap_vmas
0.05 +0.0 0.06 perf-profile.self.cycles-pp.wake_q_add
0.06 +0.0 0.07 ± 5% perf-profile.self.cycles-pp._raw_spin_lock
0.06 ± 6% +0.0 0.08 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.04 ± 44% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.34 ± 4% +0.1 0.42 perf-profile.self.cycles-pp.intel_idle_irq
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists