[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4ec19969-831c-4d9e-b585-fc02db31b343@intel.com>
Date: Fri, 17 Oct 2025 12:50:47 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Tim Chen <tim.c.chen@...ux.intel.com>, Ingo Molnar <mingo@...hat.com>, "K
Prateek Nayak" <kprateek.nayak@....com>, "Gautham R . Shenoy"
<gautham.shenoy@....com>, Vern Hao <vernhao@...cent.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Juri Lelli <juri.lelli@...hat.com>, "Dietmar
Eggemann" <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, "Valentin
Schneider" <vschneid@...hat.com>, Madadi Vineeth Reddy
<vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, Shrikanth Hegde
<sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>, Yangyu Chen
<cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Len Brown
<len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Adam Li
<adamli@...amperecomputing.com>, Tim Chen <tim.c.chen@...el.com>,
<linux-kernel@...r.kernel.org>, <haoxing990@...il.com>
Subject: Re: [PATCH 06/19] sched/fair: Assign preferred LLC ID to processes
On 10/15/2025 7:15 PM, Peter Zijlstra wrote:
> On Tue, Oct 14, 2025 at 01:16:16PM +0800, Chen, Yu C wrote:
>
>> The question becomes: how can we figure out the threads that share
>> data? Can the kernel detect this, or get the hint from user space?
>
> This needs the PMU, then you can steer using cache-miss ratios. But then
> people will hate us for using counters.
>
>> Yes, the numa_group in NUMA load balancing indicates
>> that several tasks manipulate the same page, which could be an
>> indicator. Besides, if task A frequently wakes up task B, does it
>> mean A and B have the potential to share data? Furthermore, if
>> task A wakes up B via a pipe, it might also indicate that A has
>> something to share with B. I just wonder if we can introduce a
>> structure to gather this information together.
>
> The wakeup or pipe relation might be small relative to the working set.
> Consider a sharded in memory database, where the query comes in through
> the pipe/socket/wakeup. This query is small, but then it needs to go
> trawl through its memory to find the answer.
>
> Something we *could* look at -- later -- is an interface to create
> thread groups, such that userspace that is clever enough can communicate
> this. But then there is the ago old question, will there be sufficient
> users to justify the maintenance of said interface.
I did not intend to digress too far, but since this issue has been brought
up, a wild guess came to me - could the "interface to create thread groups"
here refer to something like the filesystem for memory cgroup
v2 thread mode? I just heard that some cloud users might split the threads
of a single process into different thread groups, where threads within each
group share data with one another (for example, when performing K-V hashing
operations). Using cgroup for this purpose might be a bit overkill, though,
considering that cgroup itself is designed for resource partitioning rather
than identifying tasks sharing data. Meanwhile, the hierarchy of cgroup
could also cause some overhead. If there were a single-layer thread
partitioning
mechanism - similar to the resctrl filesystem - wouldn’t that allow us
to avoid
modifying too much user business code while minimizing coupling with
existing
kernel components?
thanks,
Chenyu
Powered by blists - more mailing lists