linux-kernel - Re: [RFC patch v3 07/20] sched: Add helper function to decide whether to allow cache aware scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18a0f0b1-24af-435f-815d-523935816d88@oracle.com>
Date: Wed, 9 Jul 2025 14:22:14 -0700
From: Libo Chen <libo.chen@...cle.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>
Cc: Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
        Tim Chen <tim.c.chen@...el.com>,
        Vincent Guittot
 <vincent.guittot@...aro.org>,
        Abel Wu <wuyun.abel@...edance.com>,
        Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
        Hillf Danton <hdanton@...a.com>, Len Brown <len.brown@...el.com>,
        linux-kernel@...r.kernel.org, Chen Yu <yu.c.chen@...el.com>
Subject: Re: [RFC patch v3 07/20] sched: Add helper function to decide whether
 to allow cache aware scheduling



On 7/8/25 14:59, Tim Chen wrote:
> On Mon, 2025-07-07 at 17:41 -0700, Libo Chen wrote:
>> Hi Tim and Chenyu,
>>
>>
>> On 6/18/25 11:27, Tim Chen wrote:
>>> Cache-aware scheduling is designed to aggregate threads into their
>>> preferred LLC, either via the task wake up path or the load balancing
>>> path. One side effect is that when the preferred LLC is saturated,
>>> more threads will continue to be stacked on it, degrading the workload's
>>> latency. A strategy is needed to prevent this aggregation from going too
>>> far such that the preferred LLC is too overloaded.
>>>
>>> Introduce helper function _get_migrate_hint() to implement the LLC
>>> migration policy:
>>>
>>> 1) A task is aggregated to its preferred LLC if both source/dest LLC
>>>    are not too busy (<50% utilization, tunable), or the preferred
>>>    LLC will not be too out of balanced from the non preferred LLC
>>>    (>20% utilization, tunable, close to imbalance_pct of the LLC
>>>    domain).
>>> 2) Allow a task to be moved from the preferred LLC to the
>>>    non-preferred one if the non-preferred LLC will not be too out
>>>    of balanced from the preferred prompting an aggregation task
>>>    migration later.  We are still experimenting with the aggregation
>>>    and migration policy. Some other possibilities are policy based
>>>    on LLC's load or average number of tasks running.  Those could
>>>    be tried out by tweaking _get_migrate_hint().
>>>
>>> The function _get_migrate_hint() returns migration suggestions for the upper-le
>>> +__read_mostly unsigned int sysctl_llc_aggr_cap       = 50;
>>> +__read_mostly unsigned int sysctl_llc_aggr_imb       = 20;
>>> +
>>
>>
>> I think this patch has a great potential.
>>
> 
> Thanks for taking a look.
> 
>> Since _get_migrate_hint() is tied to an individual task anyway, why not add a
>> per-task llc_aggr_imb which defaults to the sysctl one? 
>>
> 
> _get_migrate_hint() could also be called from llc_balance(). At that time
> we make a determination of whether we should do llc_balance() without knowing
> which exact task we're going to move, but still observe the migration policy
> that shouldn't cause too much imbalance.  So it may not be strictly tied to a task
> in the current implementation.
> 
Ah right, by setting task_util to 0

>> Tasks have different
>> preferences for llc stacking, they can all be running in the same system at the
>> same time. This way you can offer a greater deal of optimization without much
>> burden to others.
> 
> You're thinking of something like a prctl knob that will bias aggregation for
> some process?  Wonder if Peter has some opinion on this.
> 

Yes. I am sure he has hhh but we can wait until the global approach is good enough
like Chen Yu said.

>>
>> Also with sysctl_llc_aggr_imb, do we really need SCHED_CACHE_WAKE?
>>
> 
> Actually we think that we can do without SCHED_CACHE_WAKE feature and rely only
> on load balance SCHED_CACHE_LB.  But still keeping 
> 
>>  Does setting
>> sysctl_llc_aggr_imb to 0 basically say no preference for either LLC, no?
> 
> Aggregation will tend to make utilization on the preferred LLC to be more
> than the non-preferred one.  Parameter "sysctl_llc_aggr_imb" is the imbalance
> allowed.  If we set this to 0, as long as the preferred LLC is not utilized
> more than the source LLC, we could still aggregate towards the preferred LLC
> and a preference could still be there.  
> 

I see, I think I have better understanding of this now. Thanks!

Libo

> Tim
> 
>>
>> Thanks,
>> Libo
>>
>>> +static enum llc_mig_hint _get_migrate_hint(int src_cpu, int dst_cpu,
>>> +					   unsigned long tsk_util,
>>> +					   bool to_pref)
>>> +{
>>> +	unsigned long src_util, dst_util, src_cap, dst_cap;
>>> +
>>> +	if (cpus_share_cache(src_cpu, dst_cpu))
>>> +		return mig_allow;
>>> +
>>> +	if (!get_llc_stats(src_cpu, &src_util, &src_cap) ||
>>> +	    !get_llc_stats(dst_cpu, &dst_util, &dst_cap))
>>> +		return mig_allow;
>>> +
>>> +	if (!fits_llc_capacity(dst_util, dst_cap) &&
>>> +	    !fits_llc_capacity(src_util, src_cap))
>>> +		return mig_ignore;
>>> +
>>> +	src_util = src_util < tsk_util ? 0 : src_util - tsk_util;
>>> +	dst_util = dst_util + tsk_util;
>>> +	if (to_pref) {
>>> +		/*
>>> +		 * sysctl_llc_aggr_imb is the imbalance allowed between
>>> +		 * preferred LLC and non-preferred LLC.
>>> +		 * Don't migrate if we will get preferred LLC too
>>> +		 * heavily loaded and if the dest is much busier
>>> +		 * than the src, in which case migration will
>>> +		 * increase the imbalance too much.
>>> +		 */
>>> +		if (!fits_llc_capacity(dst_util, dst_cap) &&
>>> +		    util_greater(dst_util, src_util))
>>> +			return mig_forbid;
>>> +	} else {
>>> +		/*
>>> +		 * Don't migrate if we will leave preferred LLC
>>> +		 * too idle, or if this migration leads to the
>>> +		 * non-preferred LLC falls within sysctl_aggr_imb percent
>>> +		 * of preferred LLC, leading to migration again
>>> +		 * back to preferred LLC.
>>> +		 */
>>> +		if (fits_llc_capacity(src_util, src_cap) ||
>>> +		    !util_greater(src_util, dst_util))
>>> +			return mig_forbid;
>>> +	}
>>> +	return mig_allow;
>>> +}
>>
>>
>