linux-kernel - [PATCH v3 08/21] sched/cache: Calculate the percpu sd task LLC preference

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <41f8e91b70060e7697840163b80c3dc097aabb34.1770760558.git.tim.c.chen@linux.intel.com>
Date: Tue, 10 Feb 2026 14:18:48 -0800
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>,
	Vincent Guittot <vincent.guittot@...aro.org>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>,
	Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
	Hillf Danton <hdanton@...a.com>,
	Shrikanth Hegde <sshegde@...ux.ibm.com>,
	Jianyong Wu <jianyong.wu@...look.com>,
	Yangyu Chen <cyy@...self.name>,
	Tingyin Duan <tingyin.duan@...il.com>,
	Vern Hao <vernhao@...cent.com>,
	Vern Hao <haoxing990@...il.com>,
	Len Brown <len.brown@...el.com>,
	Aubrey Li <aubrey.li@...el.com>,
	Zhao Liu <zhao1.liu@...el.com>,
	Chen Yu <yu.chen.surf@...il.com>,
	Chen Yu <yu.c.chen@...el.com>,
	Adam Li <adamli@...amperecomputing.com>,
	Aaron Lu <ziqianlu@...edance.com>,
	Tim Chen <tim.c.chen@...el.com>,
	Josh Don <joshdon@...gle.com>,
	Gavin Guo <gavinguo@...lia.com>,
	Qais Yousef <qyousef@...alina.io>,
	Libo Chen <libchen@...estorage.com>,
	linux-kernel@...r.kernel.org
Subject: [PATCH v3 08/21] sched/cache: Calculate the percpu sd task LLC preference

Calculate the number of tasks' LLC preferences for each runqueue.
This statistic is computed during task enqueue and dequeue
operations, and is used by the cache-aware load balancing.

Co-developed-by: Chen Yu <yu.c.chen@...el.com>
Signed-off-by: Chen Yu <yu.c.chen@...el.com>
Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
---

Notes:
    v2->v3: Move max_llcs check from patch4 to this patch.
    This would clarify the rationale for the
    max_llc check and makes review easier (Peter Zijlstra).

 kernel/sched/fair.c | 56 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6ad9ad2f918f..4a98aa866d65 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1199,28 +1199,80 @@ static int llc_id(int cpu)
 	return per_cpu(sd_llc_id, cpu);
 }
 
+static inline bool valid_llc_id(int id)
+{
+	if (unlikely(id < 0 || id >= max_llcs))
+		return false;
+
+	return true;
+}
+
+static inline bool valid_llc_buf(struct sched_domain *sd,
+				 int id)
+{
+	/*
+	 * The check for sd and its corresponding pf is to
+	 * confirm that the sd->pf[] has been allocated in
+	 * build_sched_domains() after the assignment of
+	 * per_cpu(sd_llc_id, i). This is used to avoid
+	 * the race condition.
+	 */
+	if (unlikely(!sd || !sd->pf))
+		return false;
+
+	return valid_llc_id(id);
+}
+
 static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
 {
+	struct sched_domain *sd;
 	int pref_llc;
 
 	pref_llc = p->preferred_llc;
-	if (pref_llc < 0)
+	if (!valid_llc_id(pref_llc))
 		return;
 
 	rq->nr_llc_running++;
 	rq->nr_pref_llc_running += (pref_llc == task_llc(p));
+
+	scoped_guard (rcu) {
+		sd = rcu_dereference(rq->sd);
+		if (valid_llc_buf(sd, pref_llc))
+			sd->pf[pref_llc]++;
+	}
 }
 
 static void account_llc_dequeue(struct rq *rq, struct task_struct *p)
 {
+	struct sched_domain *sd;
 	int pref_llc;
 
 	pref_llc = p->preferred_llc;
-	if (pref_llc < 0)
+	if (!valid_llc_id(pref_llc))
 		return;
 
 	rq->nr_llc_running--;
 	rq->nr_pref_llc_running -= (pref_llc == task_llc(p));
+
+	scoped_guard (rcu) {
+		sd = rcu_dereference(rq->sd);
+		if (valid_llc_buf(sd, pref_llc)) {
+			/*
+			 * There is a race condition between dequeue
+			 * and CPU hotplug. After a task has been enqueued
+			 * on CPUx, a CPU hotplug event occurs, and all online
+			 * CPUs (including CPUx) rebuild their sched_domains
+			 * and reset statistics to zero (including sd->pf).
+			 * This can cause temporary undercount and we have to
+			 * check for such underflow in sd->pf.
+			 *
+			 * This undercount is temporary and accurate accounting
+			 * will resume once the rq has a chance to be idle.
+			 */
+			if (sd->pf[pref_llc])
+				sd->pf[pref_llc]--;
+		}
+	}
 }
 
 void mm_init_sched(struct mm_struct *mm,
-- 
2.32.0