linux-kernel - Re: [PATCH v2 15/23] sched/cache: Respect LLC preference in task migration and detach

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8b1a4729-38d7-46e8-92d6-c30e3d9b1022@intel.com>
Date: Tue, 16 Dec 2025 15:30:29 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Peter Zijlstra <peterz@...radead.org>, Tim Chen
	<tim.c.chen@...ux.intel.com>
CC: Ingo Molnar <mingo@...hat.com>, K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>, Vincent Guittot
	<vincent.guittot@...aro.org>, Juri Lelli <juri.lelli@...hat.com>, "Dietmar
 Eggemann" <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, "Valentin
 Schneider" <vschneid@...hat.com>, Madadi Vineeth Reddy
	<vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>, Shrikanth Hegde
	<sshegde@...ux.ibm.com>, Jianyong Wu <jianyong.wu@...look.com>, Yangyu Chen
	<cyy@...self.name>, Tingyin Duan <tingyin.duan@...il.com>, Vern Hao
	<vernhao@...cent.com>, Vern Hao <haoxing990@...il.com>, Len Brown
	<len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu
	<zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Adam Li
	<adamli@...amperecomputing.com>, Aaron Lu <ziqianlu@...edance.com>, Tim Chen
	<tim.c.chen@...el.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 15/23] sched/cache: Respect LLC preference in task
 migration and detach

On 12/11/2025 12:30 AM, Peter Zijlstra wrote:
> On Wed, Dec 03, 2025 at 03:07:34PM -0800, Tim Chen wrote:
> 
>> @@ -10025,6 +10025,13 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>>   	if (env->flags & LBF_ACTIVE_LB)
>>   		return 1;
>>   
>> +#ifdef CONFIG_SCHED_CACHE
>> +	if (sched_cache_enabled() &&
>> +	    can_migrate_llc_task(env->src_cpu, env->dst_cpu, p) == mig_forbid &&
>> +	    !task_has_sched_core(p))
>> +		return 0;
>> +#endif
> 
> This seems wrong:
>   - it does not let nr_balance_failed override things;
>   - it takes precedence over migrate_degrade_locality(); you really want
>     to migrate towards the preferred NUMA node over staying on your LLC.
> 
> That is, this really wants to be done after migrate_degrades_locality()
> and only if degrades == 0 or something.
> 

OK, will fix it.

>>   	degrades = migrate_degrades_locality(p, env);
>>   	if (!degrades)
>>   		hot = task_hot(p, env);
>> @@ -10146,12 +10153,55 @@ static struct list_head
>>   	list_splice(&pref_old_llc, tasks);
>>   	return tasks;
>>   }
>> +
>> +static bool stop_migrate_src_rq(struct task_struct *p,
>> +				struct lb_env *env,
>> +				int detached)
>> +{
>> +	if (!sched_cache_enabled() || p->preferred_llc == -1 ||
>> +	    cpus_share_cache(env->src_cpu, env->dst_cpu) ||
>> +	    env->sd->nr_balance_failed)
>> +		return false;
> 
> But you are allowing nr_balance_failed to override things here.
> 
>> +	/*
>> +	 * Stop migration for the src_rq and pull from a
>> +	 * different busy runqueue in the following cases:
>> +	 *
>> +	 * 1. Trying to migrate task to its preferred
>> +	 *    LLC, but the chosen task does not prefer dest
>> +	 *    LLC - case 3 in order_tasks_by_llc(). This violates
>> +	 *    the goal of migrate_llc_task. However, we should
>> +	 *    stop detaching only if some tasks have been detached
>> +	 *    and the imbalance has been mitigated.
>> +	 *
>> +	 * 2. Don't detach more tasks if the remaining tasks want
>> +	 *    to stay. We know the remaining tasks all prefer the
>> +	 *    current LLC, because after order_tasks_by_llc(), the
>> +	 *    tasks that prefer the current LLC are the least favored
>> +	 *    candidates to be migrated out.
>> +	 */
>> +	if (env->migration_type == migrate_llc_task &&
>> +	    detached && llc_id(env->dst_cpu) != p->preferred_llc)
>> +		return true;
>> +
>> +	if (llc_id(env->src_cpu) == p->preferred_llc)
>> +		return true;
>> +
>> +	return false;
>> +}
> 
> Also, I think we have a problem with nr_balance_failed, cache_nice_tries
> is 1 for SHARE_LLC; this means for failed=0 we ignore:
> 
>   - ineligible tasks
>   - llc fail
>   - node-degrading / hot
> 
> and then the very next round, we do all of them at once, without much
> grading.
> 

Do you mean we can set different thresholds for the different
scenarios you mentioned above, so as to avoid migrating tasks
at the same time in detach_tasks()?

For example,

ineligible tasks check:
if (env->sd->nr_balance_failed > env->sd->cache_nice_tries)
     can_migrate;

llc fail check:
if (env->sd->nr_balance_failed > env->sd->cache_nice_tries + 1)
     can_migrate;

node-degrading/hot check:
if (env->sd->nr_balance_failed > env->sd->cache_nice_tries + 2)
     can_migrate;


>> @@ -10205,6 +10255,15 @@ static int detach_tasks(struct lb_env *env)
>>   
>>   		p = list_last_entry(tasks, struct task_struct, se.group_node);
>>   
>> +		/*
>> +		 * Check if detaching current src_rq should be stopped, because
>> +		 * doing so would break cache aware load balance. If we stop
>> +		 * here, the env->flags has LBF_ALL_PINNED, which would cause
>> +		 * the load balance to pull from another busy runqueue.
> 
> Uhh, can_migrate_task() will clear that ALL_PINNED thing if we've found
> at least one task before getting here.
> 

One problem is that, LBF_ALL_PINNED was cleared before
migrate_degrades_locality()/can_migrate_llc_task() in detach_tasks().
I suppose we want to keep LBF_ALL_PINNED() if can_migrate_llc_task(break
llc locality) failed.

>> +		 */
>> +		if (stop_migrate_src_rq(p, env, detached))
>> +			break;
> 
> 
> Perhaps split cfs_tasks into multiple lists from the get-go? That avoids
> this sorting.

Will check with Tim on this.

thanks,
Chenyu