[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <202510231406.30bc8aec-lkp@intel.com>
Date: Thu, 23 Oct 2025 15:22:11 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, K Prateek Nayak
<kprateek.nayak@....com>, Tim Chen <tim.c.chen@...ux.intel.com>,
<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, "Gautham R . Shenoy"
<gautham.shenoy@....com>, Vincent Guittot <vincent.guittot@...aro.org>, "Juri
Lelli" <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, "Mel
Gorman" <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, "Madadi
Vineeth Reddy" <vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>, Jianyong Wu
<jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>, Tingyin Duan
<tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>, Len Brown
<len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Libo Chen
<libo.chen@...cle.com>, Adam Li <adamli@...amperecomputing.com>, Tim Chen
<tim.c.chen@...el.com>, <oliver.sang@...el.com>
Subject: Re: [PATCH 16/19] sched/fair: Exclude processes with many threads
from cache-aware scheduling
Hello,
kernel test robot noticed a 2.1% regression of will-it-scale.per_thread_ops on:
commit: cb57b28051ef1d84e7cb14db4e1ab99b4f33b4b5 ("[PATCH 16/19] sched/fair: Exclude processes with many threads from cache-aware scheduling")
url: https://github.com/intel-lab-lkp/linux/commits/Tim-Chen/sched-fair-Add-infrastructure-for-cache-aware-load-balancing/20251012-022248
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 45b7f780739a3145aeef24d2dfa02517a6c82ed6
patch link: https://lore.kernel.org/all/637cdb8ab11b1b978d697ed744cc402d32443ecc.1760206683.git.tim.c.chen@linux.intel.com/
patch subject: [PATCH 16/19] sched/fair: Exclude processes with many threads from cache-aware scheduling
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
parameters:
nr_task: 100%
mode: thread
test: tlb_flush2
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202510231406.30bc8aec-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251023/202510231406.30bc8aec-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-14/performance/x86_64-rhel-9.4/thread/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/tlb_flush2/will-it-scale
commit:
4ac141e433 ("sched/fair: Respect LLC preference in task migration and detach")
cb57b28051 ("sched/fair: Exclude processes with many threads from cache-aware scheduling")
4ac141e4330723c0 cb57b28051ef1d84e7cb14db4e1
---------------- ---------------------------
%stddev %change %stddev
\ | \
1482496 -2.1% 1451299 will-it-scale.48.threads
30884 -2.1% 30235 will-it-scale.per_thread_ops
1482496 -2.1% 1451299 will-it-scale.workload
4.447e+08 -2.1% 4.355e+08 proc-vmstat.numa_hit
4.447e+08 -2.1% 4.355e+08 proc-vmstat.numa_local
4.447e+08 -2.1% 4.354e+08 proc-vmstat.pgalloc_normal
8.884e+08 -2.1% 8.698e+08 proc-vmstat.pgfault
4.446e+08 -2.1% 4.353e+08 proc-vmstat.pgfree
6.446e+09 -2.0% 6.318e+09 perf-stat.i.branch-instructions
1.462e+08 -1.4% 1.441e+08 perf-stat.i.branch-misses
1.467e+08 -1.6% 1.444e+08 perf-stat.i.cache-misses
7.692e+08 -1.4% 7.587e+08 perf-stat.i.cache-references
101348 ± 2% +3.6% 104965 perf-stat.i.context-switches
4.14 +1.9% 4.22 perf-stat.i.cpi
883.41 +1.4% 896.20 perf-stat.i.cycles-between-cache-misses
3.083e+10 -2.0% 3.022e+10 perf-stat.i.instructions
0.24 -1.8% 0.24 perf-stat.i.ipc
124.71 -2.0% 122.18 perf-stat.i.metric.K/sec
2944589 -2.1% 2882055 perf-stat.i.minor-faults
2944589 -2.1% 2882055 perf-stat.i.page-faults
4.17 +1.9% 4.25 perf-stat.overall.cpi
876.76 +1.5% 889.96 perf-stat.overall.cycles-between-cache-misses
0.24 -1.8% 0.24 perf-stat.overall.ipc
6.417e+09 -2.0% 6.29e+09 perf-stat.ps.branch-instructions
1.455e+08 -1.4% 1.434e+08 perf-stat.ps.branch-misses
1.46e+08 -1.6% 1.436e+08 perf-stat.ps.cache-misses
7.653e+08 -1.4% 7.549e+08 perf-stat.ps.cache-references
100692 ± 2% +3.6% 104309 perf-stat.ps.context-switches
3.069e+10 -2.0% 3.008e+10 perf-stat.ps.instructions
2931887 -2.1% 2869944 perf-stat.ps.minor-faults
2931887 -2.1% 2869944 perf-stat.ps.page-faults
9.273e+12 -1.9% 9.096e+12 perf-stat.total.instructions
62.03 -1.8 60.18 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.do_madvise.__x64_sys_madvise
63.66 -1.8 61.82 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.do_madvise.__x64_sys_madvise.do_syscall_64
61.19 -1.8 59.36 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.do_madvise
65.49 -1.7 63.79 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
75.54 -1.5 74.02 perf-profile.calltrace.cycles-pp.__madvise
71.89 -1.5 70.41 perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
72.40 -1.5 70.92 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
71.83 -1.5 70.35 perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
72.35 -1.5 70.87 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
15.31 -0.6 14.70 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
12.04 -0.5 11.52 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
10.97 -0.5 10.47 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
11.08 -0.5 10.58 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
4.36 -0.2 4.15 perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
4.53 -0.2 4.34 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
2.02 ± 2% -0.1 1.95 perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
1.47 -0.1 1.42 perf-profile.calltrace.cycles-pp.folio_add_lru.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.83 ± 2% -0.0 0.80 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
0.65 ± 2% -0.0 0.62 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault
0.73 ± 2% -0.0 0.70 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.64 ± 2% -0.0 0.61 ± 2% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault
0.73 -0.0 0.71 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.testcase
0.84 -0.0 0.82 perf-profile.calltrace.cycles-pp.clear_page_erms.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol
1.65 +0.0 1.68 perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page
1.92 +0.0 1.96 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
1.79 +0.0 1.83 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
0.92 ± 3% +0.1 1.04 perf-profile.calltrace.cycles-pp.tlb_gather_mmu.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.83 ± 6% +0.2 3.04 ± 2% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
7.06 +0.4 7.48 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
6.25 +0.4 6.70 perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
2.48 +0.5 3.02 perf-profile.calltrace.cycles-pp.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +0.7 0.74 ± 5% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
0.00 +0.9 0.94 ± 4% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
19.08 +1.3 20.36 perf-profile.calltrace.cycles-pp.testcase
14.17 +1.3 15.46 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
12.74 +1.4 14.10 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
12.49 +1.4 13.85 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
7.97 +1.4 9.38 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
62.25 -1.9 60.39 perf-profile.children.cycles-pp.smp_call_function_many_cond
62.26 -1.9 60.40 perf-profile.children.cycles-pp.on_each_cpu_cond_mask
63.94 -1.8 62.08 perf-profile.children.cycles-pp.flush_tlb_mm_range
65.76 -1.7 64.06 perf-profile.children.cycles-pp.tlb_finish_mmu
75.72 -1.5 74.19 perf-profile.children.cycles-pp.__madvise
73.51 -1.5 72.02 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
73.48 -1.5 71.99 perf-profile.children.cycles-pp.do_syscall_64
71.90 -1.5 70.41 perf-profile.children.cycles-pp.__x64_sys_madvise
71.85 -1.5 70.36 perf-profile.children.cycles-pp.do_madvise
18.46 -0.3 18.12 perf-profile.children.cycles-pp.__flush_smp_call_function_queue
19.79 -0.3 19.47 perf-profile.children.cycles-pp.sysvec_call_function
22.68 -0.3 22.36 perf-profile.children.cycles-pp.asm_sysvec_call_function
17.97 -0.3 17.65 perf-profile.children.cycles-pp.__sysvec_call_function
7.64 -0.2 7.46 perf-profile.children.cycles-pp.flush_tlb_func
7.49 -0.1 7.39 perf-profile.children.cycles-pp.llist_reverse_order
1.47 -0.1 1.42 perf-profile.children.cycles-pp.folio_add_lru
1.99 -0.0 1.94 perf-profile.children.cycles-pp.__pte_offset_map_lock
1.84 -0.0 1.80 perf-profile.children.cycles-pp._raw_spin_lock
1.31 -0.0 1.27 perf-profile.children.cycles-pp.folio_batch_move_lru
0.93 -0.0 0.90 perf-profile.children.cycles-pp.error_entry
0.42 -0.0 0.40 perf-profile.children.cycles-pp.vms_clear_ptes
0.89 -0.0 0.87 perf-profile.children.cycles-pp.clear_page_erms
0.94 -0.0 0.92 perf-profile.children.cycles-pp.prep_new_page
1.66 +0.0 1.69 perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
1.80 +0.0 1.84 perf-profile.children.cycles-pp.alloc_pages_mpol
0.00 +0.1 0.05 perf-profile.children.cycles-pp.__pi_memset
0.96 ± 3% +0.1 1.08 perf-profile.children.cycles-pp.tlb_gather_mmu
2.90 ± 6% +0.2 3.10 perf-profile.children.cycles-pp.intel_idle
3.23 ± 5% +0.2 3.45 perf-profile.children.cycles-pp.cpuidle_enter
3.32 ± 5% +0.2 3.54 perf-profile.children.cycles-pp.cpuidle_idle_call
7.09 +0.4 7.51 perf-profile.children.cycles-pp.__handle_mm_fault
6.27 +0.4 6.72 perf-profile.children.cycles-pp.do_anonymous_page
0.43 ± 6% +0.5 0.94 ± 4% perf-profile.children.cycles-pp.__mem_cgroup_charge
0.25 ± 11% +0.5 0.76 ± 5% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
2.49 +0.5 3.03 perf-profile.children.cycles-pp.alloc_anon_folio
19.67 +1.3 20.94 perf-profile.children.cycles-pp.testcase
14.47 +1.3 15.77 perf-profile.children.cycles-pp.asm_exc_page_fault
12.76 +1.4 14.12 perf-profile.children.cycles-pp.exc_page_fault
12.60 +1.4 13.96 perf-profile.children.cycles-pp.do_user_addr_fault
7.99 +1.5 9.44 perf-profile.children.cycles-pp.handle_mm_fault
42.94 -1.2 41.71 perf-profile.self.cycles-pp.smp_call_function_many_cond
6.02 -0.2 5.87 perf-profile.self.cycles-pp.flush_tlb_func
7.46 -0.1 7.36 perf-profile.self.cycles-pp.llist_reverse_order
1.44 -0.0 1.40 perf-profile.self.cycles-pp.lock_vma_under_rcu
0.88 -0.0 0.85 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.91 -0.0 0.88 perf-profile.self.cycles-pp.error_entry
0.07 +0.0 0.08 ± 4% perf-profile.self.cycles-pp.get_page_from_freelist
0.76 ± 2% +0.1 0.86 perf-profile.self.cycles-pp.tlb_gather_mmu
1.10 ± 5% +0.1 1.24 ± 2% perf-profile.self.cycles-pp.tlb_finish_mmu
2.90 ± 6% +0.2 3.10 perf-profile.self.cycles-pp.intel_idle
0.20 ± 10% +0.4 0.62 ± 6% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
0.15 ± 4% +0.8 1.00 ± 3% perf-profile.self.cycles-pp.handle_mm_fault
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists