[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ea4b4d45-0fa0-4be9-b6cf-706a1c9fc5f2@intel.com>
Date: Fri, 26 Sep 2025 22:30:58 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Adam Li <adamli@...amperecomputing.com>
CC: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, "Steven
Rostedt" <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Libo Chen
<libo.chen@...cle.com>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, "Hillf
Danton" <hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>,
"Jianyong Wu" <jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>,
Tingyin Duan <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>, Len
Brown <len.brown@...el.com>, Tim Chen <tim.c.chen@...ux.intel.com>, Aubrey Li
<aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>, Chen Yu
<yu.chen.surf@...il.com>, <linux-kernel@...r.kernel.org>, Peter Zijlstra
<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, K Prateek Nayak
<kprateek.nayak@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>
Subject: Re: [RFC PATCH v4 26/28] sched: Do not enable cache aware scheduling
for process with large RSS
On 9/26/2025 4:48 PM, Adam Li wrote:
> Hi Chen Yu,
>
> Thanks for your work.
> I tested the patch set on AmpereOne CPU with 192 cores.
>
> With CONFIG_SCHED_CLUSTER enabled, and with certain firmware setting,
> every eight cores will be grouped into a 'cluster' schedule domain
> with 'SD_SHARE_LLC' flag.
> However, these eight cores do *no* share L3 cache in this setup.
>
> In exceed_llc_capacity() of this patch, we have 'llc = l3_leaf->size',
> 'llc' will be zero if there is *no* L3 cache.
> So exceed_llc_capacity() will be true and 'Cache Aware Scheduling' will
> not work. Please see details bellow.
>
> I read in patch 01/28 "sched: Cache aware load-balancing" [1],
> Peter mentioned:
> "It is an attempt at modelling cache affinity -- and while the patch
> really only targets LLC, it could very well be extended to also apply to
> clusters (L2). Specifically any case of multiple cache domains inside a
> node".
>
> Do you have any idea how we can apply the cache aware load-balancing
> to clusters? The cores in the cluster may share L2 or LLC tags.
My understanding is that if there is no L3 cache, then the L2 becomes
the LLC. We don’t need to modify the code specific to L2-aware scheduling
because the L2 is now the last-level cache (LLC). However, as you observed,
there are some cases that need to be taken care of. For example, Patch 8
needs to be fixed so that it does not always retrieve the cache size of
L3.
On the other hand, if the system has both an L2 cluster and an L3, the
code might need to be changed if we want to perform L2 cache aggregation
rather than L3 cache aggregation.
>
> [1]: https://lore.kernel.org/all/9157186cf9e3fd541f62c637579ff736b3704c51.1754712565.git.tim.c.chen@linux.intel.com/
>
> On 8/9/2025 1:08 PM, Chen Yu wrote:
>> +
>> + l3_leaf = this_cpu_ci->info_list + 3;
>> + llc = l3_leaf->size;
>> +
> For some arm64 CPU topology, cores can be grouped into 'cluster'.
> Cores in a cluster may not share L3 cache. 'l3_leaf->size'
> will be 0.
>
> It looks we assume LLC is L3 cache?
Right, but LLC should not always be L3, need a fix here.
>
> Can we skip exceed_llc_capacity() check if no L3?
I thought we should return the size of L2 instead, no?
thanks,
Chenyu> Like this draft patch:
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1227,6 +1227,8 @@ static bool exceed_llc_capacity(struct mm_struct *mm, int cpu)
>
> l3_leaf = this_cpu_ci->info_list + 3;
> llc = l3_leaf->size;
> + if (!llc)
> + return false;
Powered by blists - more mailing lists