linux-kernel - Re: [PATCH v2 14/23] sched/cache: Consider LLC preference when selecting tasks for load balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251210155848.GV3707891@noisy.programming.kicks-ass.net>
Date: Wed, 10 Dec 2025 16:58:48 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Ingo Molnar <mingo@...hat.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
	Hillf Danton <hdanton@...a.com>,
	Shrikanth Hegde <sshegde@...ux.ibm.com>,
	Jianyong Wu <jianyong.wu@...look.com>,
	Yangyu Chen <cyy@...self.name>,
	Tingyin Duan <tingyin.duan@...il.com>,
	Vern Hao <vernhao@...cent.com>, Vern Hao <haoxing990@...il.com>,
	Len Brown <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>,
	Zhao Liu <zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>,
	Chen Yu <yu.c.chen@...el.com>,
	Adam Li <adamli@...amperecomputing.com>,
	Aaron Lu <ziqianlu@...edance.com>, Tim Chen <tim.c.chen@...el.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 14/23] sched/cache: Consider LLC preference when
 selecting tasks for load balancing

On Wed, Dec 03, 2025 at 03:07:33PM -0800, Tim Chen wrote:
> Currently, task selection from the busiest runqueue ignores LLC
> preferences. Reorder tasks in the busiest queue to prioritize selection
> as follows:
> 
>   1. Tasks preferring the destination CPU's LLC
>   2. Tasks with no LLC preference
>   3. Tasks preferring an LLC different from their current one
>   4. Tasks preferring the LLC they are currently on
> 
> This improves the likelihood that tasks are migrated to their
> preferred LLC.
> 
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> ---
> 
> Notes:
>     v1->v2: No change.
> 
>  kernel/sched/fair.c | 66 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index aed3fab98d7c..dd09a816670e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10092,6 +10092,68 @@ static struct task_struct *detach_one_task(struct lb_env *env)
>  	return NULL;
>  }
>  
> +#ifdef CONFIG_SCHED_CACHE
> +/*
> + * Prepare lists to detach tasks in the following order:
> + * 1. tasks that prefer dst cpu's LLC
> + * 2. tasks that have no preference in LLC
> + * 3. tasks that prefer LLC other than the ones they are on
> + * 4. tasks that prefer the LLC that they are currently on.
> + */
> +static struct list_head
> +*order_tasks_by_llc(struct lb_env *env, struct list_head *tasks)
> +{
> +	struct task_struct *p;
> +	LIST_HEAD(pref_old_llc);
> +	LIST_HEAD(pref_new_llc);
> +	LIST_HEAD(no_pref_llc);
> +	LIST_HEAD(pref_other_llc);
> +
> +	if (!sched_cache_enabled())
> +		return tasks;
> +
> +	if (cpus_share_cache(env->dst_cpu, env->src_cpu))
> +		return tasks;
> +
> +	while (!list_empty(tasks)) {
> +		p = list_last_entry(tasks, struct task_struct, se.group_node);
> +
> +		if (p->preferred_llc == llc_id(env->dst_cpu)) {
> +			list_move(&p->se.group_node, &pref_new_llc);
> +			continue;
> +		}
> +
> +		if (p->preferred_llc == llc_id(env->src_cpu)) {
> +			list_move(&p->se.group_node, &pref_old_llc);
> +			continue;
> +		}
> +
> +		if (p->preferred_llc == -1) {
> +			list_move(&p->se.group_node, &no_pref_llc);
> +			continue;
> +		}
> +
> +		list_move(&p->se.group_node, &pref_other_llc);
> +	}
> +
> +	/*
> +	 * We detach tasks from list tail in detach tasks.  Put tasks
> +	 * to be chosen first at end of list.
> +	 */
> +	list_splice(&pref_new_llc, tasks);
> +	list_splice(&no_pref_llc, tasks);
> +	list_splice(&pref_other_llc, tasks);
> +	list_splice(&pref_old_llc, tasks);
> +	return tasks;
> +}

> @@ -10119,6 +10181,8 @@ static int detach_tasks(struct lb_env *env)
>  	if (env->imbalance <= 0)
>  		return 0;
>  
> +	tasks = order_tasks_by_llc(env, &env->src_rq->cfs_tasks);
> +
>  	while (!list_empty(tasks)) {
>  		/*
>  		 * We don't want to steal all, otherwise we may be treated likewise,

Humrph. So NUMA balancing does this differently. It skips over the tasks
that would degrade locality in can_migrate_task(); and only if
nr_balanced_failed is high enough do we ignore that.

Also, if there are a significant number of tasks on the list, this gets
in the way of things like loop_break, since it does this sort
unconditionally.

Bah, this feels like there is a sane way to integrate all this, but it
seems to escape me at the moment. I'll ponder it a bit more.