lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <637cdb8ab11b1b978d697ed744cc402d32443ecc.1760206683.git.tim.c.chen@linux.intel.com>
Date: Sat, 11 Oct 2025 11:24:53 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>
Cc: Chen Yu <yu.c.chen@...el.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>,
	Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
	Hillf Danton <hdanton@...a.com>,
	Shrikanth Hegde <sshegde@...ux.ibm.com>,
	Jianyong Wu <jianyong.wu@...look.com>,
	Yangyu Chen <cyy@...self.name>,
	Tingyin Duan <tingyin.duan@...il.com>,
	Vern Hao <vernhao@...cent.com>,
	Len Brown <len.brown@...el.com>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Aubrey Li <aubrey.li@...el.com>,
	Zhao Liu <zhao1.liu@...el.com>,
	Chen Yu <yu.chen.surf@...il.com>,
	Libo Chen <libo.chen@...cle.com>,
	Adam Li <adamli@...amperecomputing.com>,
	Tim Chen <tim.c.chen@...el.com>,
	linux-kernel@...r.kernel.org
Subject: [PATCH 16/19] sched/fair: Exclude processes with many threads from cache-aware scheduling

From: Chen Yu <yu.c.chen@...el.com>

A performance regression was observed by Prateek when running hackbench
with many threads per process (high fd count). To avoid this, processes
with a large number of active threads are excluded from cache-aware
scheduling.

With sched_cache enabled, record the number of active threads in each
process during the periodic task_cache_work(). While iterating over
CPUs, if the currently running task belongs to the same process as the
task that launched task_cache_work(), increment the active thread count.

If the count exceeds the number of CPUs in the process's preferred LLC,
sched_cache will avoid aggregating too many threads into a single LLC
domain.

Reported-by: K Prateek Nayak <kprateek.nayak@....com>
Signed-off-by: Chen Yu <yu.c.chen@...el.com>
Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
---
 include/linux/mm_types.h |  1 +
 kernel/sched/fair.c      | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3ca557c2f36d..b307f81b2fde 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1031,6 +1031,7 @@ struct mm_struct {
 		raw_spinlock_t mm_sched_lock;
 		unsigned long mm_sched_epoch;
 		int mm_sched_cpu;
+		u64 nr_running_avg ____cacheline_aligned_in_smp;
 #endif
 
 #ifdef CONFIG_MMU
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 65ff7c306a2f..79d109f8a09f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1451,12 +1451,13 @@ static void get_scan_cpumasks(cpumask_var_t cpus, int cache_cpu,
 
 static void __no_profile task_cache_work(struct callback_head *work)
 {
-	struct task_struct *p = current;
+	struct task_struct *p = current, *cur;
 	struct mm_struct *mm = p->mm;
 	unsigned long m_a_occ = 0;
 	unsigned long curr_m_a_occ = 0;
 	int cpu, m_a_cpu = -1, cache_cpu,
-	    pref_nid = NUMA_NO_NODE, curr_cpu;
+	    pref_nid = NUMA_NO_NODE, curr_cpu,
+	    nr_running = 0;
 	cpumask_var_t cpus;
 
 	WARN_ON_ONCE(work != &p->cache_work);
@@ -1497,6 +1498,14 @@ static void __no_profile task_cache_work(struct callback_head *work)
 					m_occ = occ;
 					m_cpu = i;
 				}
+
+				rcu_read_lock();
+				cur = rcu_dereference(cpu_rq(i)->curr);
+				if (cur && !(cur->flags & (PF_EXITING | PF_KTHREAD)) &&
+				    cur->mm == mm)
+					nr_running++;
+				rcu_read_unlock();
+
 			}
 
 			/*
@@ -1540,6 +1549,7 @@ static void __no_profile task_cache_work(struct callback_head *work)
 		mm->mm_sched_cpu = m_a_cpu;
 	}
 
+	update_avg(&mm->nr_running_avg, nr_running);
 	free_cpumask_var(cpus);
 }
 
-- 
2.32.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ