linux-kernel - Re: [RFC PATCH v4 26/28] sched: Do not enable cache aware scheduling for process with large RSS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ea4b4d45-0fa0-4be9-b6cf-706a1c9fc5f2@intel.com>
Date: Fri, 26 Sep 2025 22:30:58 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Adam Li <adamli@...amperecomputing.com>
CC: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli
	<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, "Steven
 Rostedt" <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Libo Chen
	<libo.chen@...cle.com>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, "Hillf
 Danton" <hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>,
	"Jianyong Wu" <jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>,
	Tingyin Duan <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>, Len
 Brown <len.brown@...el.com>, Tim Chen <tim.c.chen@...ux.intel.com>, Aubrey Li
	<aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>, Chen Yu
	<yu.chen.surf@...il.com>, <linux-kernel@...r.kernel.org>, Peter Zijlstra
	<peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, K Prateek Nayak
	<kprateek.nayak@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>
Subject: Re: [RFC PATCH v4 26/28] sched: Do not enable cache aware scheduling
 for process with large RSS

On 9/26/2025 4:48 PM, Adam Li wrote:
> Hi Chen Yu,
> 
> Thanks for your work.
> I tested the patch set on AmpereOne CPU with 192 cores.
> 
> With CONFIG_SCHED_CLUSTER enabled, and with certain firmware setting,
> every eight cores will be grouped into a 'cluster' schedule domain
> with 'SD_SHARE_LLC' flag.
> However, these eight cores do *no* share L3 cache in this setup.
> 
> In exceed_llc_capacity() of this patch, we have 'llc = l3_leaf->size',
> 'llc' will be zero if there is *no* L3 cache.
> So exceed_llc_capacity() will be true and 'Cache Aware Scheduling' will
> not work. Please see details bellow.
> 
> I read in patch 01/28 "sched: Cache aware load-balancing" [1],
> Peter mentioned:
> "It is an attempt at modelling cache affinity -- and while the patch
> really only targets LLC, it could very well be extended to also apply to
> clusters (L2). Specifically any case of multiple cache domains inside a
> node".
> 
> Do you have any idea how we can apply the cache aware load-balancing
> to clusters? The cores in the cluster may share L2 or LLC tags.

My understanding is that if there is no L3 cache, then the L2 becomes
the LLC. We don’t need to modify the code specific to L2-aware scheduling
because the L2 is now the last-level cache (LLC). However, as you observed,
there are some cases that need to be taken care of. For example, Patch 8
needs to be fixed so that it does not always retrieve the cache size of
L3.

On the other hand, if the system has both an L2 cluster and an L3, the
code might need to be changed if we want to perform L2 cache aggregation
rather than L3 cache aggregation.

> 
> [1]: https://lore.kernel.org/all/9157186cf9e3fd541f62c637579ff736b3704c51.1754712565.git.tim.c.chen@linux.intel.com/
> 
> On 8/9/2025 1:08 PM, Chen Yu wrote:
>> +
>> +	l3_leaf = this_cpu_ci->info_list + 3;
>> +	llc = l3_leaf->size;
>> +
> For some arm64 CPU topology, cores can be grouped into 'cluster'.
> Cores in a cluster may not share L3 cache. 'l3_leaf->size'
> will be 0.
> 
> It looks we assume LLC is L3 cache?

Right, but LLC should not always be L3, need a fix here.

> 
> Can we skip exceed_llc_capacity() check if no L3?

I thought we should return the size of L2 instead, no?

thanks,
Chenyu> Like this draft patch:
> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1227,6 +1227,8 @@ static bool exceed_llc_capacity(struct mm_struct *mm, int cpu)
> 
>          l3_leaf = this_cpu_ci->info_list + 3;
>          llc = l3_leaf->size;
> +       if (!llc)
> +               return false;