linux-kernel - Re: [PATCH 16/19] sched/fair: Exclude processes with many threads from cache-aware scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202510231406.30bc8aec-lkp@intel.com>
Date: Thu, 23 Oct 2025 15:22:11 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, K Prateek Nayak
	<kprateek.nayak@....com>, Tim Chen <tim.c.chen@...ux.intel.com>,
	<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
	<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, Peter Zijlstra
	<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, "Gautham R . Shenoy"
	<gautham.shenoy@....com>, Vincent Guittot <vincent.guittot@...aro.org>, "Juri
 Lelli" <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, "Mel
 Gorman" <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, "Madadi
 Vineeth Reddy" <vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>,
	Shrikanth Hegde <sshegde@...ux.ibm.com>, Jianyong Wu
	<jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>, Tingyin Duan
	<tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>, Len Brown
	<len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
	<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Libo Chen
	<libo.chen@...cle.com>, Adam Li <adamli@...amperecomputing.com>, Tim Chen
	<tim.c.chen@...el.com>, <oliver.sang@...el.com>
Subject: Re: [PATCH 16/19] sched/fair: Exclude processes with many threads
 from cache-aware scheduling



Hello,

kernel test robot noticed a 2.1% regression of will-it-scale.per_thread_ops on:


commit: cb57b28051ef1d84e7cb14db4e1ab99b4f33b4b5 ("[PATCH 16/19] sched/fair: Exclude processes with many threads from cache-aware scheduling")
url: https://github.com/intel-lab-lkp/linux/commits/Tim-Chen/sched-fair-Add-infrastructure-for-cache-aware-load-balancing/20251012-022248
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 45b7f780739a3145aeef24d2dfa02517a6c82ed6
patch link: https://lore.kernel.org/all/637cdb8ab11b1b978d697ed744cc402d32443ecc.1760206683.git.tim.c.chen@linux.intel.com/
patch subject: [PATCH 16/19] sched/fair: Exclude processes with many threads from cache-aware scheduling

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
parameters:

	nr_task: 100%
	mode: thread
	test: tlb_flush2
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202510231406.30bc8aec-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251023/202510231406.30bc8aec-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/thread/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/tlb_flush2/will-it-scale

commit: 
  4ac141e433 ("sched/fair: Respect LLC preference in task migration and detach")
  cb57b28051 ("sched/fair: Exclude processes with many threads from cache-aware scheduling")

4ac141e4330723c0 cb57b28051ef1d84e7cb14db4e1 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1482496            -2.1%    1451299        will-it-scale.48.threads
     30884            -2.1%      30235        will-it-scale.per_thread_ops
   1482496            -2.1%    1451299        will-it-scale.workload
 4.447e+08            -2.1%  4.355e+08        proc-vmstat.numa_hit
 4.447e+08            -2.1%  4.355e+08        proc-vmstat.numa_local
 4.447e+08            -2.1%  4.354e+08        proc-vmstat.pgalloc_normal
 8.884e+08            -2.1%  8.698e+08        proc-vmstat.pgfault
 4.446e+08            -2.1%  4.353e+08        proc-vmstat.pgfree
 6.446e+09            -2.0%  6.318e+09        perf-stat.i.branch-instructions
 1.462e+08            -1.4%  1.441e+08        perf-stat.i.branch-misses
 1.467e+08            -1.6%  1.444e+08        perf-stat.i.cache-misses
 7.692e+08            -1.4%  7.587e+08        perf-stat.i.cache-references
    101348 ±  2%      +3.6%     104965        perf-stat.i.context-switches
      4.14            +1.9%       4.22        perf-stat.i.cpi
    883.41            +1.4%     896.20        perf-stat.i.cycles-between-cache-misses
 3.083e+10            -2.0%  3.022e+10        perf-stat.i.instructions
      0.24            -1.8%       0.24        perf-stat.i.ipc
    124.71            -2.0%     122.18        perf-stat.i.metric.K/sec
   2944589            -2.1%    2882055        perf-stat.i.minor-faults
   2944589            -2.1%    2882055        perf-stat.i.page-faults
      4.17            +1.9%       4.25        perf-stat.overall.cpi
    876.76            +1.5%     889.96        perf-stat.overall.cycles-between-cache-misses
      0.24            -1.8%       0.24        perf-stat.overall.ipc
 6.417e+09            -2.0%   6.29e+09        perf-stat.ps.branch-instructions
 1.455e+08            -1.4%  1.434e+08        perf-stat.ps.branch-misses
  1.46e+08            -1.6%  1.436e+08        perf-stat.ps.cache-misses
 7.653e+08            -1.4%  7.549e+08        perf-stat.ps.cache-references
    100692 ±  2%      +3.6%     104309        perf-stat.ps.context-switches
 3.069e+10            -2.0%  3.008e+10        perf-stat.ps.instructions
   2931887            -2.1%    2869944        perf-stat.ps.minor-faults
   2931887            -2.1%    2869944        perf-stat.ps.page-faults
 9.273e+12            -1.9%  9.096e+12        perf-stat.total.instructions
     62.03            -1.8       60.18        perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.do_madvise.__x64_sys_madvise
     63.66            -1.8       61.82        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.do_madvise.__x64_sys_madvise.do_syscall_64
     61.19            -1.8       59.36        perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.do_madvise
     65.49            -1.7       63.79        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
     75.54            -1.5       74.02        perf-profile.calltrace.cycles-pp.__madvise
     71.89            -1.5       70.41        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     72.40            -1.5       70.92        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
     71.83            -1.5       70.35        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     72.35            -1.5       70.87        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     15.31            -0.6       14.70        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
     12.04            -0.5       11.52        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
     10.97            -0.5       10.47        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
     11.08            -0.5       10.58        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
      4.36            -0.2        4.15        perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
      4.53            -0.2        4.34        perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
      2.02 ±  2%      -0.1        1.95        perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      1.47            -0.1        1.42        perf-profile.calltrace.cycles-pp.folio_add_lru.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.83 ±  2%      -0.0        0.80        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.65 ±  2%      -0.0        0.62        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault
      0.73 ±  2%      -0.0        0.70        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.64 ±  2%      -0.0        0.61 ±  2%  perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault
      0.73            -0.0        0.71        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.testcase
      0.84            -0.0        0.82        perf-profile.calltrace.cycles-pp.clear_page_erms.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol
      1.65            +0.0        1.68        perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page
      1.92            +0.0        1.96        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      1.79            +0.0        1.83        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
      0.92 ±  3%      +0.1        1.04        perf-profile.calltrace.cycles-pp.tlb_gather_mmu.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.83 ±  6%      +0.2        3.04 ±  2%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      7.06            +0.4        7.48        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      6.25            +0.4        6.70        perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      2.48            +0.5        3.02        perf-profile.calltrace.cycles-pp.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.00            +0.7        0.74 ±  5%  perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
      0.00            +0.9        0.94 ±  4%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
     19.08            +1.3       20.36        perf-profile.calltrace.cycles-pp.testcase
     14.17            +1.3       15.46        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     12.74            +1.4       14.10        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
     12.49            +1.4       13.85        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      7.97            +1.4        9.38        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     62.25            -1.9       60.39        perf-profile.children.cycles-pp.smp_call_function_many_cond
     62.26            -1.9       60.40        perf-profile.children.cycles-pp.on_each_cpu_cond_mask
     63.94            -1.8       62.08        perf-profile.children.cycles-pp.flush_tlb_mm_range
     65.76            -1.7       64.06        perf-profile.children.cycles-pp.tlb_finish_mmu
     75.72            -1.5       74.19        perf-profile.children.cycles-pp.__madvise
     73.51            -1.5       72.02        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     73.48            -1.5       71.99        perf-profile.children.cycles-pp.do_syscall_64
     71.90            -1.5       70.41        perf-profile.children.cycles-pp.__x64_sys_madvise
     71.85            -1.5       70.36        perf-profile.children.cycles-pp.do_madvise
     18.46            -0.3       18.12        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
     19.79            -0.3       19.47        perf-profile.children.cycles-pp.sysvec_call_function
     22.68            -0.3       22.36        perf-profile.children.cycles-pp.asm_sysvec_call_function
     17.97            -0.3       17.65        perf-profile.children.cycles-pp.__sysvec_call_function
      7.64            -0.2        7.46        perf-profile.children.cycles-pp.flush_tlb_func
      7.49            -0.1        7.39        perf-profile.children.cycles-pp.llist_reverse_order
      1.47            -0.1        1.42        perf-profile.children.cycles-pp.folio_add_lru
      1.99            -0.0        1.94        perf-profile.children.cycles-pp.__pte_offset_map_lock
      1.84            -0.0        1.80        perf-profile.children.cycles-pp._raw_spin_lock
      1.31            -0.0        1.27        perf-profile.children.cycles-pp.folio_batch_move_lru
      0.93            -0.0        0.90        perf-profile.children.cycles-pp.error_entry
      0.42            -0.0        0.40        perf-profile.children.cycles-pp.vms_clear_ptes
      0.89            -0.0        0.87        perf-profile.children.cycles-pp.clear_page_erms
      0.94            -0.0        0.92        perf-profile.children.cycles-pp.prep_new_page
      1.66            +0.0        1.69        perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
      1.80            +0.0        1.84        perf-profile.children.cycles-pp.alloc_pages_mpol
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.__pi_memset
      0.96 ±  3%      +0.1        1.08        perf-profile.children.cycles-pp.tlb_gather_mmu
      2.90 ±  6%      +0.2        3.10        perf-profile.children.cycles-pp.intel_idle
      3.23 ±  5%      +0.2        3.45        perf-profile.children.cycles-pp.cpuidle_enter
      3.32 ±  5%      +0.2        3.54        perf-profile.children.cycles-pp.cpuidle_idle_call
      7.09            +0.4        7.51        perf-profile.children.cycles-pp.__handle_mm_fault
      6.27            +0.4        6.72        perf-profile.children.cycles-pp.do_anonymous_page
      0.43 ±  6%      +0.5        0.94 ±  4%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.25 ± 11%      +0.5        0.76 ±  5%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      2.49            +0.5        3.03        perf-profile.children.cycles-pp.alloc_anon_folio
     19.67            +1.3       20.94        perf-profile.children.cycles-pp.testcase
     14.47            +1.3       15.77        perf-profile.children.cycles-pp.asm_exc_page_fault
     12.76            +1.4       14.12        perf-profile.children.cycles-pp.exc_page_fault
     12.60            +1.4       13.96        perf-profile.children.cycles-pp.do_user_addr_fault
      7.99            +1.5        9.44        perf-profile.children.cycles-pp.handle_mm_fault
     42.94            -1.2       41.71        perf-profile.self.cycles-pp.smp_call_function_many_cond
      6.02            -0.2        5.87        perf-profile.self.cycles-pp.flush_tlb_func
      7.46            -0.1        7.36        perf-profile.self.cycles-pp.llist_reverse_order
      1.44            -0.0        1.40        perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.88            -0.0        0.85        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.91            -0.0        0.88        perf-profile.self.cycles-pp.error_entry
      0.07            +0.0        0.08 ±  4%  perf-profile.self.cycles-pp.get_page_from_freelist
      0.76 ±  2%      +0.1        0.86        perf-profile.self.cycles-pp.tlb_gather_mmu
      1.10 ±  5%      +0.1        1.24 ±  2%  perf-profile.self.cycles-pp.tlb_finish_mmu
      2.90 ±  6%      +0.2        3.10        perf-profile.self.cycles-pp.intel_idle
      0.20 ± 10%      +0.4        0.62 ±  6%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      0.15 ±  4%      +0.8        1.00 ±  3%  perf-profile.self.cycles-pp.handle_mm_fault




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki