linux-kernel - Re: [PATCH 06/19] sched/fair: Assign preferred LLC ID to processes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3df5a8c1-7074-4fcf-adf8-d39137314fd6@intel.com>
Date: Tue, 14 Oct 2025 13:16:16 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>, Peter Zijlstra
	<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, K Prateek Nayak
	<kprateek.nayak@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
	"Vern Hao" <vernhao@...cent.com>
CC: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
	<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, "Steven
 Rostedt" <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, "Madadi Vineeth
 Reddy" <vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, "Shrikanth
 Hegde" <sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>,
	"Yangyu Chen" <cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Len
 Brown <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
	<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Libo Chen
	<libo.chen@...cle.com>, Adam Li <adamli@...amperecomputing.com>, Tim Chen
	<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 06/19] sched/fair: Assign preferred LLC ID to processes

(Copied the question from Vern as the email seems to not reach LKML)

On 10/14/2025 2:09 AM, Tim Chen wrote:
 > On Mon, 2025-10-13 at 17:10 +0800, vernhao wrote:
 >>
 >> Tim Chen<tim.c.chen@...ux.intel.com> wrote:
 >> With cache-aware scheduling enabled, each task is assigned a
 >> preferred LLC ID. This allows quick identification of the LLC domain
 >> where the task prefers to run, similar to numa_preferred_nid in
 >> NUMA balancing.
 >>
 >> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>

[snip]

 >> +
 >> + if (mm->mm_sched_cpu != -1)
 >> + mm_sched_llc = per_cpu(sd_llc_id, mm->mm_sched_cpu);
 >>
 >> In high-concurrency multi-threaded scenarios, not all threads handle
 >> same events, so their hot data in the LLC is not completely shared.
 >> Therefore, if every thread's preferred LLC is migrated to the LLC
 >> pointed to by mm->mm_sched_cpu, this would lead to the incorrect
 >> assumption that all threads prefer the same LLC, thereby intensifying
 >> competition between LLCs.
 >
 > Yes, that's the reason why we stop aggregating to the preferred LLC
 > once the the utilization of the
 > LLC becomes too high relative to the other LLCs.
 >
 > If you know your threads characteristics before hand on which of them
 > share data together, you probably can use cgroup/cpuset
 > from user space to separate out the threads.
 >
 > There's not enough info from occupancy data for OS to group
 > the threads by data sharing. Perhaps an alternative if NUMA balancing
 > is on is to group tasks by their task numa group instead of by mm.
 >
 > That would incur the page scanning overhead etc and make
 > cache aware scheduling be dependent on NUMA balancing.
 >
 >
 >>
 >> So I'm wondering, why not move ‘mm->mm_sched_cpu’ to ‘task_struct’,
 >> so that each thread can individually track its preferred LLC?
 >> What are the losses in doing so?
 >
 > You would need a way to group related tasks together and put them
 > on the same LLC.  Either group them by mm or some other means.
 >

While Vern's use case is common in production environments, switching
to per-task_struct prefer_llc might not aggregate the threads to
dedicated LLCs. It is possible that each thread will stick to its
old LLC because the thread was forked there and the occupancy is
high on that old LLC. As a result, threads are randomly "pinned"
to different LLCs.

The question becomes: how can we figure out the threads that share
data? Can the kernel detect this, or get the hint from user space?

Yes, the numa_group in NUMA load balancing indicates
that several tasks manipulate the same page, which could be an
indicator. Besides, if task A frequently wakes up task B, does it
mean A and B have the potential to share data? Furthermore, if
task A wakes up B via a pipe, it might also indicate that A has
something to share with B. I just wonder if we can introduce a
structure to gather this information together.

thanks,
Chenyu