[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99c296f4-35bf-4427-95e8-0cdcc423f999@intel.com>
Date: Sat, 5 Jul 2025 10:26:30 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>
CC: Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann
<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
<vschneid@...hat.com>, Tim Chen <tim.c.chen@...el.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Libo Chen <libo.chen@...cle.com>, Abel Wu
<wuyun.abel@...edance.com>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
Hillf Danton <hdanton@...a.com>, Len Brown <len.brown@...el.com>,
<linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>, "Ingo
Molnar" <mingo@...hat.com>, K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>, Tim Chen
<tim.c.chen@...ux.intel.com>
Subject: Re: [RFC patch v3 14/20] sched: Introduce update_llc_busiest() to
deal with groups having preferred LLC tasks
On 7/4/2025 3:52 AM, Shrikanth Hegde wrote:
>
>
> On 6/18/25 23:58, Tim Chen wrote:
>> The load balancer attempts to identify the busiest sched_group with
>> the highest load and migrates some tasks to a less busy sched_group
>> to distribute the load across different CPUs.
>>
>> When cache-aware scheduling is enabled, the busiest sched_group is
>> defined as the one with the highest number of tasks preferring to run
>> on the destination LLC. If the busiest group has llc_balance tag,
>> the cache aware load balance will be launched.
>>
>> Introduce the helper function update_llc_busiest() to identify
>> such sched group with most tasks preferring the destination LLC.
>>
>> Co-developed-by: Chen Yu <yu.c.chen@...el.com>
>> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
>> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
>> ---
>> kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++++-
>> 1 file changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 48a090c6e885..ab3d1239d6e4 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -10848,12 +10848,36 @@ static inline bool llc_balance(struct lb_env
>> *env, struct sg_lb_stats *sgs,
>> return false;
>> }
>> +
>> +static bool update_llc_busiest(struct lb_env *env,
>> + struct sg_lb_stats *busiest,
>> + struct sg_lb_stats *sgs)
>> +{
>> + int idx;
>> +
>> + /* Only the candidate with llc_balance need to be taken care of */
>> + if (!sgs->group_llc_balance)
>> + return false;
>> +
>> + /*
>> + * There are more tasks that want to run on dst_cpu's LLC.
>> + */
>> + idx = llc_idx(env->dst_cpu);
>> + return sgs->nr_pref_llc[idx] > busiest->nr_pref_llc[idx];
>> +}
>> #else
>> static inline bool llc_balance(struct lb_env *env, struct
>> sg_lb_stats *sgs,
>> struct sched_group *group)
>> {
>> return false;
>> }
>> +
>> +static bool update_llc_busiest(struct lb_env *env,
>> + struct sg_lb_stats *busiest,
>> + struct sg_lb_stats *sgs)
>> +{
>> + return false;
>> +}
>> #endif
>> static inline long sibling_imbalance(struct lb_env *env,
>> @@ -11085,6 +11109,14 @@ static bool update_sd_pick_busiest(struct
>> lb_env *env,
>> sds->local_stat.group_type != group_has_spare))
>> return false;
>> + /* deal with prefer LLC load balance, if failed, fall into normal
>> load balance */
>> + if (update_llc_busiest(env, busiest, sgs))
>> + return true;
>> +
>> + /* if there is already a busy group, skip the normal load balance */
>> + if (busiest->group_llc_balance)
>> + return false;
>> +
>
> If you had a group which was group_overloaded but it could have
> group_llc_balance right?
Yes.
> In this case the priorities based on group_type is not followed no?
>
Currently, group_llc_balance appears to take precedence over the
normal group_type. The setting of group_llc_balance is determined by
_get_migrate_hint(). We've made efforts to set this flag carefully to
avoid disrupting the normal load balancing.
For example, group_llc_balance won't be enabled when both the destination
LLC and source LLC surpass 50% of the average utilization. As for
group_overloaded, its threshold is set at 85% utilization
(imbalance_pct=117).
So in this case, the group_overloaded would be honored.
>> if (sgs->group_type > busiest->group_type)
>> return true;
>> @@ -11991,9 +12023,11 @@ static struct sched_group
>> *sched_balance_find_src_group(struct lb_env *env)
>> /*
>> * Try to move all excess tasks to a sibling domain of the busiest
>> * group's child domain.
>> + * Also do so if we can move some tasks that prefer the local LLC.
>> */
>> if (sds.prefer_sibling && local->group_type == group_has_spare &&
>> - sibling_imbalance(env, &sds, busiest, local) > 1)
>> + (busiest->group_llc_balance ||
>> + sibling_imbalance(env, &sds, busiest, local) > 1))
>> goto force_balance;
>> if (busiest->group_type != group_overloaded) {
>
> Also, This load balancing happening due to llc could be very tricky to
> debug.
> Any stats added to schedstat or sched/debug?
OK, we can add some in the next version.
Thanks,
Chenyu
Powered by blists - more mailing lists