linux-kernel - Re: [PATCH 15/19] sched/fair: Respect LLC preference in task migration and detach

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35424dcfef4caf32076b4bbece2dafddb495e730.camel@linux.intel.com>
Date: Mon, 03 Nov 2025 13:41:34 -0800
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: "Chen, Yu C" <yu.c.chen@...el.com>, K Prateek Nayak
 <kprateek.nayak@....com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli	
 <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
 Gorman <mgorman@...e.de>,  Valentin Schneider	 <vschneid@...hat.com>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, Hillf Danton
 <hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>, Jianyong Wu	
 <jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>, Tingyin Duan	
 <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>, Len Brown	
 <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu	
 <zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Adam Li	
 <adamli@...amperecomputing.com>, Tim Chen <tim.c.chen@...el.com>, 
	linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
 "Gautham R . Shenoy" <gautham.shenoy@....com>, Ingo Molnar
 <mingo@...hat.com>
Subject: Re: [PATCH 15/19] sched/fair: Respect LLC preference in task
 migration and detach

On Fri, 2025-10-31 at 23:17 +0800, Chen, Yu C wrote:
> Hi Prateek,
> 
> On 10/31/2025 11:32 AM, K Prateek Nayak wrote:
> > Hello Tim,
> > 
> > On 10/31/2025 1:37 AM, Tim Chen wrote:
> > > On Thu, 2025-10-30 at 09:49 +0530, K Prateek Nayak wrote:
> > > > Hello Tim,
> > > > 
> > > > On 10/30/2025 2:39 AM, Tim Chen wrote:
> > > > > > > I suppose you are suggesting that the threshold for stopping task detachment
> > > > > > > should be higher. With the above can_migrate_llc() check, I suppose we have
> > > > > > > raised the threshold for stopping "task detachment"?
> > > > > > 
> > > > > > Say the LLC is under heavy load and we only have overloaded groups.
> > > > > > can_migrate_llc() would return "mig_unrestricted" since
> > > > > > fits_llc_capacity() would return false.
> > > > > > 
> > > > > > Since we are under "migrate_load", sched_balance_find_src_rq() has
> > > > > > returned the CPU with the highest load which could very well be the
> > > > > > CPU with with a large number of preferred LLC tasks.
> > > > > > 
> > > > > > sched_cache_enabled() is still true and when detach_tasks() reaches
> > > > > > one of these preferred llc tasks (which comes at the very end of the
> > > > > > tasks list),
> > > > > > we break out even if env->imbalance > 0 leaving
> > > > > 
> > > > > Yes, but at least one task has been removed to even the load (making forward progress) and
> > > > > the remaining tasks all wish to stay in the current LLC and will
> > > > > preferred not to be moved. My thought was to not even all the load out
> > > > > in one shot and pull more tasks out of their preferred LLC.
> > > > > If the imbalance still remain, we'll come to that in the next load balance.
> > > > 
> > > > In that case, can we spoof a LBF_ALL_PINNED for the case where we start
> > > 
> > > In the code chunk (with fix I mentioned in last reply):
> > > 
> > > +#ifdef CONFIG_SCHED_CACHE
> > > +		/*
> > > +		 * Don't detach more tasks if the remaining tasks want
> > > +		 * to stay. We know the remaining tasks all prefer the
> > > +		 * current LLC, because after order_tasks_by_llc(), the
> > > +		 * tasks that prefer the current LLC are at the tail of
> > > +		 * the list. The inhibition of detachment is to avoid too
> > > +		 * many tasks being migrated out of the preferred LLC.
> > > +		 */
> > > +		if (sched_cache_enabled() && detached && p->preferred_llc != -1 &&
> > > +		    llc_id(env->src_cpu) == p->preferred_llc &&
> > > 		    llc_id(env->dst_cpu) != p->preferred_llc)
> > > +			break;
> > > 
> > > We have already pulled at least one task when we stop detaching because we
> > > know that all the remaining tasks want to stay in it current LLC.
> > > "detached" is non zero when we break. So LBF_ALL_PINNED would be cleared.
> > > We will only exit the detach_tasks loop when there are truly no tasks
> > > that can be moved and it is truly a LBF_ALL_PINNED case.
> > 
> > So what I was suggesting is something like:
> > 
> > @@ -10251,6 +10252,7 @@ static int detach_tasks(struct lb_env *env)
> >   	unsigned long util, load;
> >   	struct task_struct *p;
> >   	int detached = 0;
> > +	bool preserve_preferred;
> >   
> >   	lockdep_assert_rq_held(env->src_rq);
> >   
> > @@ -10268,6 +10270,10 @@ static int detach_tasks(struct lb_env *env)
> >   
> >   	tasks = order_tasks_by_llc(env, &env->src_rq->cfs_tasks);
> >   
> > +	preserve_preferred = sched_cache_enabled() &&
> > +			     !(env->sd->flags & SD_SHARE_LLC) &&
> 
> Maybe also check (env->sd->child->flag & SD_SHARE_LLC) because we only
> care about the domain that is the parent of a LLC domain.
> 
> > +			     !sd->nr_balance_failed;
>  > +
> >   	while (!list_empty(tasks)) {
> >   		/*
> >   		 * We don't want to steal all, otherwise we may be treated likewise,
> > @@ -10370,16 +10376,15 @@ static int detach_tasks(struct lb_env *env)
> >   
> >   #ifdef CONFIG_SCHED_CACHE
> >   		/*
> > -		 * Don't detach more tasks if the remaining tasks want
> > -		 * to stay. We know the remaining tasks all prefer the
> > -		 * current LLC, because after order_tasks_by_llc(), the
> > -		 * tasks that prefer the current LLC are at the tail of
> > -		 * the list. The inhibition of detachment is to avoid too
> > -		 * many tasks being migrated out of the preferred LLC.
> > +		 * We've hit tasks that prefer src LLC while balancing between LLCs.
> > +		 * If previous balances have been successful, pretend the rest of the
> > +		 * tasks on this CPU are pinned and let the main load balancing loop
> > +		 * find another target CPU to pull from if imbalance exists.
> >   		 */
> > -		if (sched_cache_enabled() && detached && p->preferred_llc != -1 &&
> > -		    llc_id(env->src_cpu) == p->preferred_llc)
> > +		if (preserve_preferred && detached && llc_id(env->src_cpu) == p->preferred_llc) {
> > +			env->flags |= LBF_ALL_PINNED;
> 
> Let me try to understand this strategy: if all previous migrations
> on this sched_domain have succeeded, it means that even if we stop
> migrating tasks out of this busiest CPU from now on, it won’t
> matter because the imbalance has already been mitigated. If we stop
> the migration, we should look for other busy CPUs to pull some tasks
> from. One concern is that setting LBF_ALL_PINNED and only clearing
> env->dst_cpu will trigger a full re-scan of the entire sched_domain,
> which might be costly-especially on large LLCs. We can try this to
> see if it has any impact on the benchmark.

I think it does cause update_sd_lb_stats() to be called again with
the previous rq taken out.  So we are spending more CPU cycles
to find an alternative task to balance to try to preserve LLC preference.

Tim

> 
> thanks,
> Chenyu
> 
> >   			break;
> > +		}
> >   #endif
> >   
> >