linux-kernel - Re: [RFC patch v3 07/20] sched: Add helper function to decide whether to allow cache aware scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6a36eae1-7d1c-4c23-aec3-056d641e3edd@intel.com>
Date: Tue, 8 Jul 2025 16:29:47 +0800
From: "Chen, Yu C" <yu.c.chen@...el.com>
To: Libo Chen <libo.chen@...cle.com>
CC: Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann
	<dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
	<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
	<vschneid@...hat.com>, Tim Chen <tim.c.chen@...el.com>, Vincent Guittot
	<vincent.guittot@...aro.org>, Abel Wu <wuyun.abel@...edance.com>, "Madadi
 Vineeth Reddy" <vineethr@...ux.ibm.com>, Hillf Danton <hdanton@...a.com>,
	"Len Brown" <len.brown@...el.com>, <linux-kernel@...r.kernel.org>, Tim Chen
	<tim.c.chen@...ux.intel.com>, Peter Zijlstra <peterz@...radead.org>, "Ingo
 Molnar" <mingo@...hat.com>, K Prateek Nayak <kprateek.nayak@....com>,
	"Gautham R . Shenoy" <gautham.shenoy@....com>
Subject: Re: [RFC patch v3 07/20] sched: Add helper function to decide whether
 to allow cache aware scheduling

On 7/8/2025 8:41 AM, Libo Chen wrote:
> Hi Tim and Chenyu,
> 
> 
> On 6/18/25 11:27, Tim Chen wrote:
>> Cache-aware scheduling is designed to aggregate threads into their
>> preferred LLC, either via the task wake up path or the load balancing
>> path. One side effect is that when the preferred LLC is saturated,
>> more threads will continue to be stacked on it, degrading the workload's
>> latency. A strategy is needed to prevent this aggregation from going too
>> far such that the preferred LLC is too overloaded.
>>
>> Introduce helper function _get_migrate_hint() to implement the LLC
>> migration policy:
>>
>> 1) A task is aggregated to its preferred LLC if both source/dest LLC
>>     are not too busy (<50% utilization, tunable), or the preferred
>>     LLC will not be too out of balanced from the non preferred LLC
>>     (>20% utilization, tunable, close to imbalance_pct of the LLC
>>     domain).
>> 2) Allow a task to be moved from the preferred LLC to the
>>     non-preferred one if the non-preferred LLC will not be too out
>>     of balanced from the preferred prompting an aggregation task
>>     migration later.  We are still experimenting with the aggregation
>>     and migration policy. Some other possibilities are policy based
>>     on LLC's load or average number of tasks running.  Those could
>>     be tried out by tweaking _get_migrate_hint().
>>
>> The function _get_migrate_hint() returns migration suggestions for the upper-le
>> +__read_mostly unsigned int sysctl_llc_aggr_cap       = 50;
>> +__read_mostly unsigned int sysctl_llc_aggr_imb       = 20;
>> +
> 
> 
> I think this patch has a great potential.
> 
> Since _get_migrate_hint() is tied to an individual task anyway, why not add a
> per-task llc_aggr_imb which defaults to the sysctl one? Tasks have different
> preferences for llc stacking, they can all be running in the same system at the
> same time. This way you can offer a greater deal of optimization without much
> burden to others.

Yes, this doable. It can be evaluated after the global generic strategy
has been verified to work, like NUMA balancing :)

> 
> Also with sysctl_llc_aggr_imb, do we really need SCHED_CACHE_WAKE? 

Do you mean the SCHED_CACHE_WAKE or SCHED_CACHE_LB?

> Does setting sysctl_llc_aggr_imb to 0 basically say no preference for either LLC, no?
> 

My understanding is that, if sysctl_llc_aggr_imb is 0, the task aggregation
might still consider other aspects, like if that target LLC's 
utilization has
exceeded 50% or not.

thanks,
Chenyu> Thanks,
> Libo
> 
>> +static enum llc_mig_hint _get_migrate_hint(int src_cpu, int dst_cpu,
>> +					   unsigned long tsk_util,
>> +					   bool to_pref)
>> +{
>> +	unsigned long src_util, dst_util, src_cap, dst_cap;
>> +
>> +	if (cpus_share_cache(src_cpu, dst_cpu))
>> +		return mig_allow;
>> +
>> +	if (!get_llc_stats(src_cpu, &src_util, &src_cap) ||
>> +	    !get_llc_stats(dst_cpu, &dst_util, &dst_cap))
>> +		return mig_allow;
>> +
>> +	if (!fits_llc_capacity(dst_util, dst_cap) &&
>> +	    !fits_llc_capacity(src_util, src_cap))
>> +		return mig_ignore;
>> +
>> +	src_util = src_util < tsk_util ? 0 : src_util - tsk_util;
>> +	dst_util = dst_util + tsk_util;
>> +	if (to_pref) {
>> +		/*
>> +		 * sysctl_llc_aggr_imb is the imbalance allowed between
>> +		 * preferred LLC and non-preferred LLC.
>> +		 * Don't migrate if we will get preferred LLC too
>> +		 * heavily loaded and if the dest is much busier
>> +		 * than the src, in which case migration will
>> +		 * increase the imbalance too much.
>> +		 */
>> +		if (!fits_llc_capacity(dst_util, dst_cap) &&
>> +		    util_greater(dst_util, src_util))
>> +			return mig_forbid;
>> +	} else {
>> +		/*
>> +		 * Don't migrate if we will leave preferred LLC
>> +		 * too idle, or if this migration leads to the
>> +		 * non-preferred LLC falls within sysctl_aggr_imb percent
>> +		 * of preferred LLC, leading to migration again
>> +		 * back to preferred LLC.
>> +		 */
>> +		if (fits_llc_capacity(src_util, src_cap) ||
>> +		    !util_greater(src_util, dst_util))
>> +			return mig_forbid;
>> +	}
>> +	return mig_allow;
>> +}
> 
>