[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251210163013.GW3707891@noisy.programming.kicks-ass.net>
Date: Wed, 10 Dec 2025 17:30:13 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Ingo Molnar <mingo@...hat.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
Hillf Danton <hdanton@...a.com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>,
Jianyong Wu <jianyong.wu@...look.com>,
Yangyu Chen <cyy@...self.name>,
Tingyin Duan <tingyin.duan@...il.com>,
Vern Hao <vernhao@...cent.com>, Vern Hao <haoxing990@...il.com>,
Len Brown <len.brown@...el.com>, Aubrey Li <aubrey.li@...el.com>,
Zhao Liu <zhao1.liu@...el.com>, Chen Yu <yu.chen.surf@...il.com>,
Chen Yu <yu.c.chen@...el.com>,
Adam Li <adamli@...amperecomputing.com>,
Aaron Lu <ziqianlu@...edance.com>, Tim Chen <tim.c.chen@...el.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 15/23] sched/cache: Respect LLC preference in task
migration and detach
On Wed, Dec 03, 2025 at 03:07:34PM -0800, Tim Chen wrote:
> @@ -10025,6 +10025,13 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> if (env->flags & LBF_ACTIVE_LB)
> return 1;
>
> +#ifdef CONFIG_SCHED_CACHE
> + if (sched_cache_enabled() &&
> + can_migrate_llc_task(env->src_cpu, env->dst_cpu, p) == mig_forbid &&
> + !task_has_sched_core(p))
> + return 0;
> +#endif
This seems wrong:
- it does not let nr_balance_failed override things;
- it takes precedence over migrate_degrade_locality(); you really want
to migrate towards the preferred NUMA node over staying on your LLC.
That is, this really wants to be done after migrate_degrades_locality()
and only if degrades == 0 or something.
> degrades = migrate_degrades_locality(p, env);
> if (!degrades)
> hot = task_hot(p, env);
> @@ -10146,12 +10153,55 @@ static struct list_head
> list_splice(&pref_old_llc, tasks);
> return tasks;
> }
> +
> +static bool stop_migrate_src_rq(struct task_struct *p,
> + struct lb_env *env,
> + int detached)
> +{
> + if (!sched_cache_enabled() || p->preferred_llc == -1 ||
> + cpus_share_cache(env->src_cpu, env->dst_cpu) ||
> + env->sd->nr_balance_failed)
> + return false;
But you are allowing nr_balance_failed to override things here.
> + /*
> + * Stop migration for the src_rq and pull from a
> + * different busy runqueue in the following cases:
> + *
> + * 1. Trying to migrate task to its preferred
> + * LLC, but the chosen task does not prefer dest
> + * LLC - case 3 in order_tasks_by_llc(). This violates
> + * the goal of migrate_llc_task. However, we should
> + * stop detaching only if some tasks have been detached
> + * and the imbalance has been mitigated.
> + *
> + * 2. Don't detach more tasks if the remaining tasks want
> + * to stay. We know the remaining tasks all prefer the
> + * current LLC, because after order_tasks_by_llc(), the
> + * tasks that prefer the current LLC are the least favored
> + * candidates to be migrated out.
> + */
> + if (env->migration_type == migrate_llc_task &&
> + detached && llc_id(env->dst_cpu) != p->preferred_llc)
> + return true;
> +
> + if (llc_id(env->src_cpu) == p->preferred_llc)
> + return true;
> +
> + return false;
> +}
Also, I think we have a problem with nr_balance_failed, cache_nice_tries
is 1 for SHARE_LLC; this means for failed=0 we ignore:
- ineligible tasks
- llc fail
- node-degrading / hot
and then the very next round, we do all of them at once, without much
grading.
> @@ -10205,6 +10255,15 @@ static int detach_tasks(struct lb_env *env)
>
> p = list_last_entry(tasks, struct task_struct, se.group_node);
>
> + /*
> + * Check if detaching current src_rq should be stopped, because
> + * doing so would break cache aware load balance. If we stop
> + * here, the env->flags has LBF_ALL_PINNED, which would cause
> + * the load balance to pull from another busy runqueue.
Uhh, can_migrate_task() will clear that ALL_PINNED thing if we've found
at least one task before getting here.
> + */
> + if (stop_migrate_src_rq(p, env, detached))
> + break;
Perhaps split cfs_tasks into multiple lists from the get-go? That avoids
this sorting.
Powered by blists - more mailing lists