linux-kernel - Re: [PATCH v2 15/23] sched/cache: Respect LLC preference in task migration and detach

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251210163013.GW3707891@noisy.programming.kicks-ass.net>
Date: Wed, 10 Dec 2025 17:30:13 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Ingo Molnar <mingo@...hat.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
	Hillf Danton <hdanton@...a.com>,
	Shrikanth Hegde <sshegde@...ux.ibm.com>,
	Jianyong Wu <jianyong.wu@...look.com>,
	Yangyu Chen <cyy@...self.name>,
	Tingyin Duan <tingyin.duan@...il.com>,
	Vern Hao <vernhao@...cent.com>, Vern Hao <haoxing990@...il.com>,
	Len Brown <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>,
	Zhao Liu <zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>,
	Chen Yu <yu.c.chen@...el.com>,
	Adam Li <adamli@...amperecomputing.com>,
	Aaron Lu <ziqianlu@...edance.com>, Tim Chen <tim.c.chen@...el.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 15/23] sched/cache: Respect LLC preference in task
 migration and detach

On Wed, Dec 03, 2025 at 03:07:34PM -0800, Tim Chen wrote:

> @@ -10025,6 +10025,13 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>  	if (env->flags & LBF_ACTIVE_LB)
>  		return 1;
>  
> +#ifdef CONFIG_SCHED_CACHE
> +	if (sched_cache_enabled() &&
> +	    can_migrate_llc_task(env->src_cpu, env->dst_cpu, p) == mig_forbid &&
> +	    !task_has_sched_core(p))
> +		return 0;
> +#endif

This seems wrong:
 - it does not let nr_balance_failed override things;
 - it takes precedence over migrate_degrade_locality(); you really want
   to migrate towards the preferred NUMA node over staying on your LLC.

That is, this really wants to be done after migrate_degrades_locality()
and only if degrades == 0 or something.

>  	degrades = migrate_degrades_locality(p, env);
>  	if (!degrades)
>  		hot = task_hot(p, env);
> @@ -10146,12 +10153,55 @@ static struct list_head
>  	list_splice(&pref_old_llc, tasks);
>  	return tasks;
>  }
> +
> +static bool stop_migrate_src_rq(struct task_struct *p,
> +				struct lb_env *env,
> +				int detached)
> +{
> +	if (!sched_cache_enabled() || p->preferred_llc == -1 ||
> +	    cpus_share_cache(env->src_cpu, env->dst_cpu) ||
> +	    env->sd->nr_balance_failed)
> +		return false;

But you are allowing nr_balance_failed to override things here.

> +	/*
> +	 * Stop migration for the src_rq and pull from a
> +	 * different busy runqueue in the following cases:
> +	 *
> +	 * 1. Trying to migrate task to its preferred
> +	 *    LLC, but the chosen task does not prefer dest
> +	 *    LLC - case 3 in order_tasks_by_llc(). This violates
> +	 *    the goal of migrate_llc_task. However, we should
> +	 *    stop detaching only if some tasks have been detached
> +	 *    and the imbalance has been mitigated.
> +	 *
> +	 * 2. Don't detach more tasks if the remaining tasks want
> +	 *    to stay. We know the remaining tasks all prefer the
> +	 *    current LLC, because after order_tasks_by_llc(), the
> +	 *    tasks that prefer the current LLC are the least favored
> +	 *    candidates to be migrated out.
> +	 */
> +	if (env->migration_type == migrate_llc_task &&
> +	    detached && llc_id(env->dst_cpu) != p->preferred_llc)
> +		return true;
> +
> +	if (llc_id(env->src_cpu) == p->preferred_llc)
> +		return true;
> +
> +	return false;
> +}

Also, I think we have a problem with nr_balance_failed, cache_nice_tries
is 1 for SHARE_LLC; this means for failed=0 we ignore:

 - ineligible tasks
 - llc fail
 - node-degrading / hot

and then the very next round, we do all of them at once, without much
grading.

> @@ -10205,6 +10255,15 @@ static int detach_tasks(struct lb_env *env)
>  
>  		p = list_last_entry(tasks, struct task_struct, se.group_node);
>  
> +		/*
> +		 * Check if detaching current src_rq should be stopped, because
> +		 * doing so would break cache aware load balance. If we stop
> +		 * here, the env->flags has LBF_ALL_PINNED, which would cause
> +		 * the load balance to pull from another busy runqueue.

Uhh, can_migrate_task() will clear that ALL_PINNED thing if we've found
at least one task before getting here.

> +		 */
> +		if (stop_migrate_src_rq(p, env, detached))
> +			break;


Perhaps split cfs_tasks into multiple lists from the get-go? That avoids
this sorting.