[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c047e50b-13f4-4234-8590-0f82314bcb8f@intel.com>
Date: Thu, 23 Oct 2025 14:55:51 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, Tim Chen
<tim.c.chen@...ux.intel.com>
CC: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, "K
Prateek Nayak" <kprateek.nayak@....com>, "Gautham R . Shenoy"
<gautham.shenoy@....com>, Vincent Guittot <vincent.guittot@...aro.org>, "Juri
Lelli" <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, "Mel
Gorman" <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, "Hillf
Danton" <hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>,
"Jianyong Wu" <jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>,
Tingyin Duan <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>, Len
Brown <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Adam Li
<adamli@...amperecomputing.com>, Tim Chen <tim.c.chen@...el.com>,
<linux-kernel@...r.kernel.org>, <haoxing990@...il.com>
Subject: Re: [PATCH 17/19] sched/fair: Disable cache aware scheduling for
processes with high thread counts
On 10/23/2025 1:21 AM, Madadi Vineeth Reddy wrote:
> On 11/10/25 23:54, Tim Chen wrote:
>> From: Chen Yu <yu.c.chen@...el.com>
>>
>> If the number of active threads within the process
>> exceeds the number of Cores(divided by SMTs number)
>> in the LLC, do not enable cache-aware scheduling.
>> This is because there is a risk of cache contention
>> within the preferred LLC when too many threads are
>> present.
>>
>> Reported-by: K Prateek Nayak <kprateek.nayak@....com>
>> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
>> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
>> ---
>> kernel/sched/fair.c | 27 +++++++++++++++++++++++++--
>> 1 file changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 79d109f8a09f..6b8eace79eee 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -1240,6 +1240,18 @@ static inline int pref_llc_idx(struct task_struct *p)
>> return llc_idx(p->preferred_llc);
>> }
>>
>> +static bool exceed_llc_nr(struct mm_struct *mm, int cpu)
>> +{
>> + int smt_nr = 1;
>> +
>> +#ifdef CONFIG_SCHED_SMT
>> + if (sched_smt_active())
>> + smt_nr = cpumask_weight(cpu_smt_mask(cpu));
>> +#endif
>> +
>> + return ((mm->nr_running_avg * smt_nr) > per_cpu(sd_llc_size, cpu));
>
> In Power10 and Power11 that has SMT8 and LLC size of 4, this would disable
> cache aware scheduling even for one thread.
>
Using smt_nr was mainly due to concerns about introducing regressions
on Power, as discussed in v3
https://lore.kernel.org/all/8f6c7c69-b6b3-4c82-8db3-96757f09245f@linux.ibm.com/
and
https://lore.kernel.org/all/ddb9d558-d114-41db-9d4b-296fc2ecdbb4@linux.ibm.com/
It seems that aggregating tasks on an LLC with many SMT
threads/smaller LLC size would pose a risk of cache
contention. Additionally, with patch [19/19], users can tune
/sys/kernel/debug/sched/llc_aggr_tolerance to adjust the threshold:
return ((mm->nr_running_avg * smt_nr) > (scale * per_cpu(sd_llc_size,
cpu)));
> Also, llc_overload_pct already ensures the load on the preferred LLC doesn't
> exceed certain capacity. Why is this exceed_llc_nr() check needed? Won't the
> existing overload_pct naturally prevent excessive task aggregation by blocking
> migrations when the destination LLC reaches ~50% utilization?
>
Using exceed_llc_nr() was because some short-duration tasks could
generate low utilization but still cause cache contention (for
some reason, the util_avg cannot track that properly), such as
schbench. Therefore, we inhibit task aggregation for a large number
of active threads.
thanks,
Chenyu
Powered by blists - more mailing lists