linux-kernel - Re: [PATCH 15/19] sched/fair: Respect LLC preference in task migration and detach

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ebe994addb5624089db71df8fee402a664f8800a.camel@linux.intel.com>
Date: Wed, 29 Oct 2025 14:09:04 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: K Prateek Nayak <kprateek.nayak@....com>, "Chen, Yu C"
 <yu.c.chen@...el.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli	
 <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
 Gorman <mgorman@...e.de>,  Valentin Schneider	 <vschneid@...hat.com>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, Hillf Danton
 <hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>, Jianyong Wu	
 <jianyong.wu@...look.com>, Yangyu Chen <cyy@...self.name>, Tingyin Duan	
 <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>, Len Brown	
 <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>, Zhao Liu	
 <zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>, Adam Li	
 <adamli@...amperecomputing.com>, Tim Chen <tim.c.chen@...el.com>, 
	linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
 "Gautham R . Shenoy" <gautham.shenoy@....com>, Ingo Molnar
 <mingo@...hat.com>
Subject: Re: [PATCH 15/19] sched/fair: Respect LLC preference in task
 migration and detach

On Wed, 2025-10-29 at 09:24 +0530, K Prateek Nayak wrote:
> Hello Chenyu,
> 
> On 10/28/2025 5:28 PM, Chen, Yu C wrote:
> > Hi Prateek,
> > 
> > On 10/28/2025 2:02 PM, K Prateek Nayak wrote:
> > > Hello Tim,
> > > 
> > > On 10/11/2025 11:54 PM, Tim Chen wrote:
> > > > @@ -9969,6 +9969,12 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> > > >       if (env->flags & LBF_ACTIVE_LB)
> > > >           return 1;
> > > >   +#ifdef CONFIG_SCHED_CACHE
> > > > +    if (sched_cache_enabled() &&
> > > > +        can_migrate_llc_task(env->src_cpu, env->dst_cpu, p) == mig_forbid)
> > > > +        return 0;
> > > > +#endif
> > > > +
> > > >       degrades = migrate_degrades_locality(p, env);
> > > >       if (!degrades)
> > > >           hot = task_hot(p, env);
> > > 
> > > Should we care for task_hot() w.r.t. migration cost if a task is being
> > > moved to a preferred LLC?
> > > 
> > 
> > This is a good question. The decision not to migrate a task when its
> > LLC preference is violated takes priority over the check in task_hot().
> > 
> > The main reason is that we want cache aware aggregation to be more
> > aggressive than generic migration; otherwise, cache-aware migration
> >  might not take effect according to our previous test. This seems to
> > be a trade-off. Another consideration might be: should we consider
> > the occupancy of a single thread or that of the entire process?
> > For example, suppose t0, t1, and t2 belong to the same process. t0
> > and t1 are running on the process's preferred LLC0, while t2 is
> > running on the non-preferred LLC1. Even though t2 has high occupancy
> > on LLC1 (making it cache-hot on LLC1), we might still want to move t2
> > to LLC0 if t0, t1, and t2 read from and write to each other - since we don't want to generate cross-LLC access.
> 
> Makes sense. That would need some heuristics based on the avg_running
> to know which LLC can be be a potential target with fewest migrations.
> But then again, in a dynamic system things change so quickly - what
> you have now seems to be a good start to further optimize on top of.
> 
> > 
> > > Also, should we leave out tasks under core scheduling from the llc
> > > aware lb? Even discount them when calculating "mm->nr_running_avg"?
> > > 
> > Yes, it seems that the cookie match check case was missed, which is
> > embedded in task_hot(). I suppose you are referring to the p->core_cookie
> > check; I'll look into this direction.
> 
> Yup! I think if user has opted into core scheduling, they should ideally
> not bother about cache aware scheduling.
> 
> > 
> > > > @@ -10227,6 +10233,20 @@ static int detach_tasks(struct lb_env *env)
> > > >           if (env->imbalance <= 0)
> > > >               break;
> > > >   +#ifdef CONFIG_SCHED_CACHE
> > > > +        /*
> > > > +         * Don't detach more tasks if the remaining tasks want
> > > > +         * to stay. We know the remaining tasks all prefer the
> > > > +         * current LLC, because after order_tasks_by_llc(), the
> > > > +         * tasks that prefer the current LLC are at the tail of
> > > > +         * the list. The inhibition of detachment is to avoid too
> > > > +         * many tasks being migrated out of the preferred LLC.
> > > > +         */
> > > > +        if (sched_cache_enabled() && detached && p->preferred_llc != -1 &&
> > > > +            llc_id(env->src_cpu) == p->preferred_llc)
> > > > +            break;
> > > 
> > > In all cases? 
> > > 

Not in all cases, but only when we know that the remaining tasks prefer to
stay in current LLC and not be moved to an LLC it doesn't like.

I think we need to add the check that
llc_id(env->dst_cpu) != p->preferred_llc in the above condition 

> > > Should we check can_migrate_llc() wrt to util migrated and
> > > then make a call if we should move the preferred LLC tasks or not?
> > > 
> > 
> > Prior to this "stop of detaching tasks", we performed a can_migrate_task(p)
> > to determine if the detached p is dequeued from its preferred LLC, and in
> > can_migrate_task(), we use can_migrate_llc_task() -> can_migrate_llc() to
> > carry out the check. That is to say, only when certain tasks have been
> > detached, will we stop further detaching.
> > 
> > > Perhaps disallow it the first time if "nr_balance_failed" is 0 but
> > > subsequent failed attempts should perhaps explore breaking the preferred
> > > llc restriction if there is an imbalance and we are under
> > > "mig_unrestricted" conditions.
> > > 
> > 
> > I suppose you are suggesting that the threshold for stopping task detachment
> > should be higher. With the above can_migrate_llc() check, I suppose we have
> > raised the threshold for stopping "task detachment"?
> 
> Say the LLC is under heavy load and we only have overloaded groups.
> can_migrate_llc() would return "mig_unrestricted" since
> fits_llc_capacity() would return false.
> 
> Since we are under "migrate_load", sched_balance_find_src_rq() has
> returned the CPU with the highest load which could very well be the
> CPU with with a large number of preferred LLC tasks.
> 
> sched_cache_enabled() is still true and when detach_tasks() reaches
> one of these preferred llc tasks (which comes at the very end of the
> tasks list), 
> we break out even if env->imbalance > 0 leaving

Yes, but at least one task has been removed to even the load (making forward progress) and
the remaining tasks all wish to stay in the current LLC and will
preferred not to be moved. My thought was to not even all the load out
in one shot and pull more tasks out of their preferred LLC.
If the imbalance still remain, we'll come to that in the next load balance.

Pulling tasks more slowly when we come to tasks that preferred to stay (if possible)
would also help to prevent tasks bouncing between LLC.

Tim

> potential imbalance for the "migrate_load" case.
> 
> Instead, we can account for the util moved out of the src_llc and
> after accounting for it, check if can_migrate_llc() would return
> "mig_forbid" for the src llc.