linux-kernel - Re: [PATCH 17/19] sched/fair: Disable cache aware scheduling for processes with high thread counts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1a63ee2a-9c1e-4aa3-adb0-012e0eae5dcf@linux.ibm.com>
Date: Wed, 22 Oct 2025 22:51:12 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Chen Yu
 <yu.c.chen@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
        Hillf Danton <hdanton@...a.com>,
        Shrikanth Hegde <sshegde@...ux.ibm.com>,
        Jianyong Wu <jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>,
        Tingyin Duan <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>,
        Len Brown <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>,
        Zhao Liu <zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>,
        Libo Chen <libo.chen@...cle.com>,
        Adam Li <adamli@...amperecomputing.com>,
        Tim Chen <tim.c.chen@...el.com>, linux-kernel@...r.kernel.org,
        Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [PATCH 17/19] sched/fair: Disable cache aware scheduling for
 processes with high thread counts

On 11/10/25 23:54, Tim Chen wrote:
> From: Chen Yu <yu.c.chen@...el.com>
> 
> If the number of active threads within the process
> exceeds the number of Cores(divided by SMTs number)
> in the LLC, do not enable cache-aware scheduling.
> This is because there is a risk of cache contention
> within the preferred LLC when too many threads are
> present.
> 
> Reported-by: K Prateek Nayak <kprateek.nayak@....com>
> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> ---
>  kernel/sched/fair.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 79d109f8a09f..6b8eace79eee 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1240,6 +1240,18 @@ static inline int pref_llc_idx(struct task_struct *p)
>  	return llc_idx(p->preferred_llc);
>  }
>  
> +static bool exceed_llc_nr(struct mm_struct *mm, int cpu)
> +{
> +	int smt_nr = 1;
> +
> +#ifdef CONFIG_SCHED_SMT
> +	if (sched_smt_active())
> +		smt_nr = cpumask_weight(cpu_smt_mask(cpu));
> +#endif
> +
> +	return ((mm->nr_running_avg * smt_nr) > per_cpu(sd_llc_size, cpu));

In Power10 and Power11 that has SMT8 and LLC size of 4, this would disable
cache aware scheduling even for one thread.

Also, llc_overload_pct already ensures the load on the  preferred LLC doesn't
exceed certain capacity. Why is this exceed_llc_nr() check needed? Won't the
existing overload_pct naturally prevent excessive task aggregation by blocking
migrations when the destination LLC reaches ~50% utilization?

Thanks,
Madadi Vineeth Reddy

> +}
> +
>  static void account_llc_enqueue(struct rq *rq, struct task_struct *p)
>  {
>  	int pref_llc;
> @@ -1385,10 +1397,12 @@ void account_mm_sched(struct rq *rq, struct task_struct *p, s64 delta_exec)
>  
>  	/*
>  	 * If this task hasn't hit task_cache_work() for a while, or it
> -	 * has only 1 thread, invalidate its preferred state.
> +	 * has only 1 thread, or has too many active threads, invalidate
> +	 * its preferred state.
>  	 */
>  	if (epoch - READ_ONCE(mm->mm_sched_epoch) > EPOCH_LLC_AFFINITY_TIMEOUT ||
> -	    get_nr_threads(p) <= 1) {
> +	    get_nr_threads(p) <= 1 ||
> +	    exceed_llc_nr(mm, cpu_of(rq))) {
>  		if (mm->mm_sched_cpu != -1)
>  			mm->mm_sched_cpu = -1;
>  	}
> @@ -1467,6 +1481,11 @@ static void __no_profile task_cache_work(struct callback_head *work)
>  	if (p->flags & PF_EXITING)
>  		return;
>  
> +	if (get_nr_threads(p) <= 1) {
> +		mm->mm_sched_cpu = -1;
> +		return;
> +	}
> +
>  	if (!zalloc_cpumask_var(&cpus, GFP_KERNEL))
>  		return;
>  
> @@ -9826,6 +9845,10 @@ static enum llc_mig can_migrate_llc_task(int src_cpu, int dst_cpu,
>  	if (cpu < 0 || cpus_share_cache(src_cpu, dst_cpu))
>  		return mig_unrestricted;
>  
> +	 /* skip cache aware load balance for single/too many threads */
> +	if (get_nr_threads(p) <= 1 || exceed_llc_nr(mm, dst_cpu))
> +		return mig_unrestricted;
> +
>  	if (cpus_share_cache(dst_cpu, cpu))
>  		to_pref = true;
>  	else if (cpus_share_cache(src_cpu, cpu))