[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a63ee2a-9c1e-4aa3-adb0-012e0eae5dcf@linux.ibm.com>
Date: Wed, 22 Oct 2025 22:51:12 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Chen Yu
<yu.c.chen@...el.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Hillf Danton <hdanton@...a.com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>,
Jianyong Wu <jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>,
Tingyin Duan <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>,
Len Brown <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>,
Zhao Liu <zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>,
Libo Chen <libo.chen@...cle.com>,
Adam Li <adamli@...amperecomputing.com>,
Tim Chen <tim.c.chen@...el.com>, linux-kernel@...r.kernel.org,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [PATCH 17/19] sched/fair: Disable cache aware scheduling for
processes with high thread counts
On 11/10/25 23:54, Tim Chen wrote:
> From: Chen Yu <yu.c.chen@...el.com>
>
> If the number of active threads within the process
> exceeds the number of Cores(divided by SMTs number)
> in the LLC, do not enable cache-aware scheduling.
> This is because there is a risk of cache contention
> within the preferred LLC when too many threads are
> present.
>
> Reported-by: K Prateek Nayak <kprateek.nayak@....com>
> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> ---
> kernel/sched/fair.c | 27 +++++++++++++++++++++++++--
> 1 file changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 79d109f8a09f..6b8eace79eee 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1240,6 +1240,18 @@ static inline int pref_llc_idx(struct task_struct *p)
> return llc_idx(p->preferred_llc);
> }
>
> +static bool exceed_llc_nr(struct mm_struct *mm, int cpu)
> +{
> + int smt_nr = 1;
> +
> +#ifdef CONFIG_SCHED_SMT
> + if (sched_smt_active())
> + smt_nr = cpumask_weight(cpu_smt_mask(cpu));
> +#endif
> +
> + return ((mm->nr_running_avg * smt_nr) > per_cpu(sd_llc_size, cpu));
In Power10 and Power11 that has SMT8 and LLC size of 4, this would disable
cache aware scheduling even for one thread.
Also, llc_overload_pct already ensures the load on the preferred LLC doesn't
exceed certain capacity. Why is this exceed_llc_nr() check needed? Won't the
existing overload_pct naturally prevent excessive task aggregation by blocking
migrations when the destination LLC reaches ~50% utilization?
Thanks,
Madadi Vineeth Reddy
> +}
> +
> static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
> {
> int pref_llc;
> @@ -1385,10 +1397,12 @@ void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
>
> /*
> * If this task hasn't hit task_cache_work() for a while, or it
> - * has only 1 thread, invalidate its preferred state.
> + * has only 1 thread, or has too many active threads, invalidate
> + * its preferred state.
> */
> if (epoch - READ_ONCE(mm->mm_sched_epoch) > EPOCH_LLC_AFFINITY_TIMEOUT ||
> - get_nr_threads(p) <= 1) {
> + get_nr_threads(p) <= 1 ||
> + exceed_llc_nr(mm, cpu_of(rq))) {
> if (mm->mm_sched_cpu != -1)
> mm->mm_sched_cpu = -1;
> }
> @@ -1467,6 +1481,11 @@ static void __no_profile task_cache_work(struct callback_head *work)
> if (p->flags & PF_EXITING)
> return;
>
> + if (get_nr_threads(p) <= 1) {
> + mm->mm_sched_cpu = -1;
> + return;
> + }
> +
> if (!zalloc_cpumask_var(&cpus, GFP_KERNEL))
> return;
>
> @@ -9826,6 +9845,10 @@ static enum llc_mig can_migrate_llc_task(int src_cpu, int dst_cpu,
> if (cpu < 0 || cpus_share_cache(src_cpu, dst_cpu))
> return mig_unrestricted;
>
> + /* skip cache aware load balance for single/too many threads */
> + if (get_nr_threads(p) <= 1 || exceed_llc_nr(mm, dst_cpu))
> + return mig_unrestricted;
> +
> if (cpus_share_cache(dst_cpu, cpu))
> to_pref = true;
> else if (cpus_share_cache(src_cpu, cpu))
Powered by blists - more mailing lists