lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99c296f4-35bf-4427-95e8-0cdcc423f999@intel.com>
Date: Sat, 5 Jul 2025 10:26:30 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>
CC: Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann
	<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
	<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
	<vschneid@...hat.com>, Tim Chen <tim.c.chen@...el.com>, Vincent Guittot
	<vincent.guittot@...aro.org>, Libo Chen <libo.chen@...cle.com>, Abel Wu
	<wuyun.abel@...edance.com>, Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
	Hillf Danton <hdanton@...a.com>, Len Brown <len.brown@...el.com>,
	<linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>, "Ingo
 Molnar" <mingo@...hat.com>, K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>, Tim Chen
	<tim.c.chen@...ux.intel.com>
Subject: Re: [RFC patch v3 14/20] sched: Introduce update_llc_busiest() to
 deal with groups having preferred LLC tasks

On 7/4/2025 3:52 AM, Shrikanth Hegde wrote:
> 
> 
> On 6/18/25 23:58, Tim Chen wrote:
>> The load balancer attempts to identify the busiest sched_group with
>> the highest load and migrates some tasks to a less busy sched_group
>> to distribute the load across different CPUs.
>>
>> When cache-aware scheduling is enabled, the busiest sched_group is
>> defined as the one with the highest number of tasks preferring to run
>> on the destination LLC. If the busiest group has llc_balance tag,
>> the cache aware load balance will be launched.
>>
>> Introduce the helper function update_llc_busiest() to identify
>> such sched group with most tasks preferring the destination LLC.
>>
>> Co-developed-by: Chen Yu <yu.c.chen@...el.com>
>> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
>> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
>> ---
>>   kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++++-
>>   1 file changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 48a090c6e885..ab3d1239d6e4 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -10848,12 +10848,36 @@ static inline bool llc_balance(struct lb_env 
>> *env, struct sg_lb_stats *sgs,
>>       return false;
>>   }
>> +
>> +static bool update_llc_busiest(struct lb_env *env,
>> +                   struct sg_lb_stats *busiest,
>> +                   struct sg_lb_stats *sgs)
>> +{
>> +    int idx;
>> +
>> +    /* Only the candidate with llc_balance need to be taken care of */
>> +    if (!sgs->group_llc_balance)
>> +        return false;
>> +
>> +    /*
>> +     * There are more tasks that want to run on dst_cpu's LLC.
>> +     */
>> +    idx = llc_idx(env->dst_cpu);
>> +    return sgs->nr_pref_llc[idx] > busiest->nr_pref_llc[idx];
>> +}
>>   #else
>>   static inline bool llc_balance(struct lb_env *env, struct 
>> sg_lb_stats *sgs,
>>                      struct sched_group *group)
>>   {
>>       return false;
>>   }
>> +
>> +static bool update_llc_busiest(struct lb_env *env,
>> +                   struct sg_lb_stats *busiest,
>> +                   struct sg_lb_stats *sgs)
>> +{
>> +    return false;
>> +}
>>   #endif
>>   static inline long sibling_imbalance(struct lb_env *env,
>> @@ -11085,6 +11109,14 @@ static bool update_sd_pick_busiest(struct 
>> lb_env *env,
>>            sds->local_stat.group_type != group_has_spare))
>>           return false;
>> +    /* deal with prefer LLC load balance, if failed, fall into normal 
>> load balance */
>> +    if (update_llc_busiest(env, busiest, sgs))
>> +        return true;
>> +
>> +    /* if there is already a busy group, skip the normal load balance */
>> +    if (busiest->group_llc_balance)
>> +        return false;
>> +
> 
> If you had a group which was group_overloaded but it could have 
> group_llc_balance right?

Yes.

> In this case the priorities based on group_type is not followed no?
> 

Currently, group_llc_balance appears to take precedence over the
normal group_type. The setting of group_llc_balance is determined by
_get_migrate_hint(). We've made efforts to set this flag carefully to
avoid disrupting the normal load balancing.

For example, group_llc_balance won't be enabled when both the destination
LLC and source LLC surpass 50% of the average utilization. As for
group_overloaded, its threshold is set at 85% utilization 
(imbalance_pct=117).
So in this case, the group_overloaded would be honored.

>>       if (sgs->group_type > busiest->group_type)
>>           return true;
>> @@ -11991,9 +12023,11 @@ static struct sched_group 
>> *sched_balance_find_src_group(struct lb_env *env)
>>       /*
>>        * Try to move all excess tasks to a sibling domain of the busiest
>>        * group's child domain.
>> +     * Also do so if we can move some tasks that prefer the local LLC.
>>        */
>>       if (sds.prefer_sibling && local->group_type == group_has_spare &&
>> -        sibling_imbalance(env, &sds, busiest, local) > 1)
>> +        (busiest->group_llc_balance ||
>> +        sibling_imbalance(env, &sds, busiest, local) > 1))
>>           goto force_balance;
>>       if (busiest->group_type != group_overloaded) {
> 
> Also, This load balancing happening due to llc could be very tricky to 
> debug.
> Any stats added to schedstat or sched/debug?

OK, we can add some in the next version.

Thanks,
Chenyu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ