[<prev] [next>] [day] [month] [year] [list]
Message-ID: <8e1c24c3e0e4cd13935beb8c1cef4b24e642f22b.camel@linux.intel.com>
Date: Tue, 14 Oct 2025 13:13:07 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: vernhao(郝信) <vernhao@...cent.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, K
Prateek Nayak <kprateek.nayak@....com>, "Gautham R . Shenoy"
<gautham.shenoy@....com>, haoxing990@...il.com
Cc: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, Hillf Danton
<hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>, Jianyong Wu
<jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>, Tingyin Duan
<tingyin.duan@...il.com>, Len Brown <len.brown@...el.com>, Aubrey Li
<aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>, Chen Yu
<yu.chen.surf@...il.com>, Chen Yu <yu.c.chen@...el.com>, Libo Chen
<libo.chen@...cle.com>, Adam Li <adamli@...amperecomputing.com>, Tim Chen
<tim.c.chen@...el.com>, linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: 回复:[Internet]Re: 回复:[PATCH 06/19] sched/fair: Assign
preferred LLC ID to processes
On Tue, 2025-10-14 at 15:07 +0800, vernhao(郝信) wrote:
> Hi Tim,
>
> >
> >
[snip]
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 61c129bde8b6..d6167a029c47 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1312,6 +1312,7 @@ void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
> > struct mm_struct *mm = p->mm;
> > struct mm_sched *pcpu_sched;
> > unsigned long epoch;
> > + int mm_sched_llc = -1;
> >
> > if (!sched_cache_enabled())
> > return;
> > @@ -1342,6 +1343,12 @@ void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
> > if (mm->mm_sched_cpu != -1)
> > mm->mm_sched_cpu = -1;
> > }
> > +
> > + if (mm->mm_sched_cpu != -1)
> > + mm_sched_llc = per_cpu(sd_llc_id, mm->mm_sched_cpu);
> >
> > In high-concurrency multi-threaded scenarios, not all threads handle same events, so their hot data in the LLC is not completely shared.
> > Therefore, if every thread's preferred LLC is migrated to the LLC pointed to by mm->mm_sched_cpu, this would lead to the incorrect
> > assumption that all threads prefer the same LLC, thereby intensifying competition between LLCs.
>
> Yes, that's the reason why we stop aggregating to the preferred LLC once the the utilization of the
> LLC becomes too high relative to the other LLCs.
>
> But this approach is only a compensatory measure after the fact. The threads have already undergone incorrect migration to they are not perferred LLC.
> Is there a better way to handle this situation?
The threads would stay where they were instead of migrating to preferred LLC
that's overloaded.
>
> If you know your threads characteristics before hand on which of them
> share data together, you probably can use cgroup/cpuset
> from user space to separate out the threads.
>
> Yes, this is a solution, and I am trying to implement it.
>
> There's not enough info from occupancy data for OS to group
> the threads by data sharing. Perhaps an alternative if NUMA balancing
> is on is to group tasks by their task numa group instead of by mm.
>
> This may not be a good solution either, especially for virtual machine scenarios which has no NUMA.
If you are in a VM, the cache topology may not correspond to
real CPU cache topology and you probably should not enable cache
aware scheduling inside, unless you are doing some explicit
binding of VCPUs.
>
> That would incur the page scanning overhead etc and make
> cache aware scheduling be dependent on NUMA balancing.
>
>
> >
> > So I'm wondering, why not move ‘mm->mm_sched_cpu’ to ‘task_struct’, so that each thread can individually track its preferred LLC? What are the losses in doing so?
>
> You would need a way to group related tasks together and put them
> on the same LLC. Either group them by mm or some other means.
>
> Yes, you are right, how about this, beside in 'mm', add cgroup support too ?
Doing cgroup may not solve the original issue you brought
up, where a process may have a group of tasks wanting to go
into one cache and another group of tasks going to another cache.
I could be wrong but I don't think you can split up tasks in a process
in cgroup v2 to different cgroups.
Also the cgroup folks are quite resistant to adding new knobs.
Tim
>
> >
> > +
> > + if (p->preferred_llc != mm_sched_llc)
> > + p->preferred_llc = mm_sched_llc;
> > }
> >
> > static void task_tick_cache(struct rq *rq, struct task_struct *p)
Powered by blists - more mailing lists