lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3df5a8c1-7074-4fcf-adf8-d39137314fd6@intel.com>
Date: Tue, 14 Oct 2025 13:16:16 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>, Peter Zijlstra
	<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, K Prateek Nayak
	<kprateek.nayak@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
	"Vern Hao" <vernhao@...cent.com>
CC: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
	<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, "Steven
 Rostedt" <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, "Madadi Vineeth
 Reddy" <vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, "Shrikanth
 Hegde" <sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>,
	"Yangyu Chen" <cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Len
 Brown <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
	<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Libo Chen
	<libo.chen@...cle.com>, Adam Li <adamli@...amperecomputing.com>, Tim Chen
	<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 06/19] sched/fair: Assign preferred LLC ID to processes

(Copied the question from Vern as the email seems to not reach LKML)

On 10/14/2025 2:09 AM, Tim Chen wrote:
 > On Mon, 2025-10-13 at 17:10 +0800, vernhao wrote:
 >>
 >> Tim Chen<tim.c.chen@...ux.intel.com> wrote:
 >> With cache-aware scheduling enabled, each task is assigned a
 >> preferred LLC ID. This allows quick identification of the LLC domain
 >> where the task prefers to run, similar to numa_preferred_nid in
 >> NUMA balancing.
 >>
 >> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>

[snip]

 >> +
 >> + if (mm->mm_sched_cpu != -1)
 >> + mm_sched_llc = per_cpu(sd_llc_id, mm->mm_sched_cpu);
 >>
 >> In high-concurrency multi-threaded scenarios, not all threads handle
 >> same events, so their hot data in the LLC is not completely shared.
 >> Therefore, if every thread's preferred LLC is migrated to the LLC
 >> pointed to by mm->mm_sched_cpu, this would lead to the incorrect
 >> assumption that all threads prefer the same LLC, thereby intensifying
 >> competition between LLCs.
 >
 > Yes, that's the reason why we stop aggregating to the preferred LLC
 > once the the utilization of the
 > LLC becomes too high relative to the other LLCs.
 >
 > If you know your threads characteristics before hand on which of them
 > share data together, you probably can use cgroup/cpuset
 > from user space to separate out the threads.
 >
 > There's not enough info from occupancy data for OS to group
 > the threads by data sharing. Perhaps an alternative if NUMA balancing
 > is on is to group tasks by their task numa group instead of by mm.
 >
 > That would incur the page scanning overhead etc and make
 > cache aware scheduling be dependent on NUMA balancing.
 >
 >
 >>
 >> So I'm wondering, why not move ‘mm->mm_sched_cpu’ to ‘task_struct’,
 >> so that each thread can individually track its preferred LLC?
 >> What are the losses in doing so?
 >
 > You would need a way to group related tasks together and put them
 > on the same LLC.  Either group them by mm or some other means.
 >

While Vern's use case is common in production environments, switching
to per-task_struct prefer_llc might not aggregate the threads to
dedicated LLCs. It is possible that each thread will stick to its
old LLC because the thread was forked there and the occupancy is
high on that old LLC. As a result, threads are randomly "pinned"
to different LLCs.

The question becomes: how can we figure out the threads that share
data? Can the kernel detect this, or get the hint from user space?

Yes, the numa_group in NUMA load balancing indicates
that several tasks manipulate the same page, which could be an
indicator. Besides, if task A frequently wakes up task B, does it
mean A and B have the potential to share data? Furthermore, if
task A wakes up B via a pipe, it might also indicate that A has
something to share with B. I just wonder if we can introduce a
structure to gather this information together.

thanks,
Chenyu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ