[<prev] [next>] [day] [month] [year] [list]
Message-ID: <eed85200babfcfd43669270912176d38b8cc8f69.camel@linux.intel.com>
Date: Mon, 13 Oct 2025 11:09:41 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: vernhao(郝信) <vernhao@...cent.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, K
Prateek Nayak <kprateek.nayak@....com>, "Gautham R . Shenoy"
<gautham.shenoy@....com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, Hillf Danton
<hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>, Jianyong Wu
<jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>, Tingyin Duan
<tingyin.duan@...il.com>, Len Brown <len.brown@...el.com>, Aubrey Li
<aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>, Chen Yu
<yu.chen.surf@...il.com>, Chen Yu <yu.c.chen@...el.com>, Libo Chen
<libo.chen@...cle.com>, Adam Li <adamli@...amperecomputing.com>, Tim Chen
<tim.c.chen@...el.com>, linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: 回复:[Internet][PATCH
06/19] sched/fair: Assign preferred LLC ID to processes
On Mon, 2025-10-13 at 17:10 +0800, vernhao(郝信) wrote:
>
> Tim Chen<tim.c.chen@...ux.intel.com> 在 2025年10月12日 周日 2:18 写道:
> With cache-aware scheduling enabled, each task is assigned a
> preferred LLC ID. This allows quick identification of the LLC domain
> where the task prefers to run, similar to numa_preferred_nid in
> NUMA balancing.
>
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> ---
> include/linux/sched.h | 1 +
> init/init_task.c | 3 +++
> kernel/sched/fair.c | 7 +++++++
> 3 files changed, 11 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d7ddb7ce6c4b..8a5e4038cd5c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1402,6 +1402,7 @@ struct task_struct {
>
> #ifdef CONFIG_SCHED_CACHE
> struct callback_head cache_work;
> + int preferred_llc;
> #endif
>
> #ifdef CONFIG_RSEQ
> diff --git a/init/init_task.c b/init/init_task.c
> index e557f622bd90..5fffbe766f57 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -188,6 +188,9 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
> .numa_group = NULL,
> .numa_faults = NULL,
> #endif
> +#ifdef CONFIG_SCHED_CACHE
> + .preferred_llc = -1,
> +#endif
> #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
> .kasan_depth = 1,
> #endif
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 61c129bde8b6..d6167a029c47 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1312,6 +1312,7 @@ void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
> struct mm_struct *mm = p->mm;
> struct mm_sched *pcpu_sched;
> unsigned long epoch;
> + int mm_sched_llc = -1;
>
> if (!sched_cache_enabled())
> return;
> @@ -1342,6 +1343,12 @@ void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
> if (mm->mm_sched_cpu != -1)
> mm->mm_sched_cpu = -1;
> }
> +
> + if (mm->mm_sched_cpu != -1)
> + mm_sched_llc = per_cpu(sd_llc_id, mm->mm_sched_cpu);
>
> In high-concurrency multi-threaded scenarios, not all threads handle same events, so their hot data in the LLC is not completely shared.
> Therefore, if every thread's preferred LLC is migrated to the LLC pointed to by mm->mm_sched_cpu, this would lead to the incorrect
> assumption that all threads prefer the same LLC, thereby intensifying competition between LLCs.
Yes, that's the reason why we stop aggregating to the preferred LLC once the the utilization of the
LLC becomes too high relative to the other LLCs.
If you know your threads characteristics before hand on which of them
share data together, you probably can use cgroup/cpuset
from user space to separate out the threads.
There's not enough info from occupancy data for OS to group
the threads by data sharing. Perhaps an alternative if NUMA balancing
is on is to group tasks by their task numa group instead of by mm.
That would incur the page scanning overhead etc and make
cache aware scheduling be dependent on NUMA balancing.
>
> So I'm wondering, why not move ‘mm->mm_sched_cpu’ to ‘task_struct’, so that each thread can individually track its preferred LLC? What are the losses in doing so?
You would need a way to group related tasks together and put them
on the same LLC. Either group them by mm or some other means.
Tim
>
> +
> + if (p->preferred_llc != mm_sched_llc)
> + p->preferred_llc = mm_sched_llc;
> }
>
> static void task_tick_cache(struct rq *rq, struct task_struct *p)
Powered by blists - more mailing lists