linux-kernel - Re: [PATCH v2 03/23] sched/cache: Introduce helper functions to enforce LLC migration policy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_A25A82E2DA498E4EB831310A21488D897C08@qq.com>
Date: Fri, 23 Jan 2026 02:13:46 +0800
From: Yangyu Chen <cyy@...self.name>
To: Tim Chen <tim.c.chen@...ux.intel.com>,
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 K Prateek Nayak <kprateek.nayak@....com>,
 "Gautham R . Shenoy" <gautham.shenoy@....com>,
 Vincent Guittot <vincent.guittot@...aro.org>
Cc: Chen Yu <yu.c.chen@...el.com>, Juri Lelli <juri.lelli@...hat.com>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
 Hillf Danton <hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>,
 Jianyong Wu <jianyong.wu@...look.com>, Tingyin Duan
 <tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>,
 Vern Hao <haoxing990@...il.com>, Len Brown <len.brown@...el.com>,
 Aubrey Li <aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>,
 Chen Yu <yu.chen.surf@...il.com>, Adam Li <adamli@...amperecomputing.com>,
 Aaron Lu <ziqianlu@...edance.com>, Tim Chen <tim.c.chen@...el.com>,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 03/23] sched/cache: Introduce helper functions to
 enforce LLC migration policy



On 4/12/2025 07:07, Tim Chen wrote:
> From: Chen Yu <yu.c.chen@...el.com>
> 
> Cache-aware scheduling aggregates threads onto their preferred LLC,
> mainly through load balancing. When the preferred LLC becomes
> saturated, more threads are still placed there, increasing latency.
> A mechanism is needed to limit aggregation so that the preferred LLC
> does not become overloaded.
> 
> Introduce helper functions can_migrate_llc() and
> can_migrate_llc_task() to enforce the LLC migration policy:
> 
>    1. Aggregate a task to its preferred LLC if both source and
>       destination LLCs are not too busy (<50% utilization),

Hi Chen Yu and Tim Chen,

I would like to ask why here chooses 50% for LLC busy. For example, a 
common AMD Zen 3-5 has 8 cores per LLC. When these servers turned SMT 
off, an 8-thread process can not be scheduled within 1 LLC. I think here 
can be 100% by default.

Thanks,
Yangyu Chen

>       or if doing so will not leave the preferred LLC much more
>       imbalanced than the non-preferred one (>20% utilization
>       difference, similar to imbalance_pct of the LLC domain).
>    2. Allow moving a task from overloaded preferred LLC to a non preferred
>       LLC if this will not cause the non preferred LLC to become
>       too imbalanced to cause a later migration back.
>    3. If both LLCs are too busy, let the generic load balance to spread
>       the tasks.
> 
> Further (hysteresis)action could be taken in the future to prevent tasks
> from being migrated into and out of the preferred LLC frequently (back and
> forth): the threshold for migrating a task out of its preferred LLC should
> be higher than that for migrating it into the LLC.
> 
> Since aggregation tends to make the preferred LLC busier than others,
> the imbalance tolerance is controlled by llc_imb_pct. If set to 0,
> tasks may still aggregate to the preferred LLC as long as it is
> not more utilized than the source LLC, preserving the preference.
> 
> Co-developed-by: Tim Chen <tim.c.chen@...ux.intel.com>
> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> ---
> 
> Notes:
>      v1->v2:
>         No change.
> 
>   kernel/sched/fair.c  | 153 +++++++++++++++++++++++++++++++++++++++++++
>   kernel/sched/sched.h |   5 ++
>   2 files changed, 158 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b9f336300f14..710ed9943d27 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1205,6 +1205,9 @@ static s64 update_se(struct rq *rq, struct sched_entity *se)
>   #define EPOCH_PERIOD	(HZ / 100)	/* 10 ms */
>   #define EPOCH_LLC_AFFINITY_TIMEOUT	5	/* 50 ms */
>   
> +__read_mostly unsigned int llc_overload_pct       = 50;
> +__read_mostly unsigned int llc_imb_pct            = 20;
> +
>   static int llc_id(int cpu)
>   {
>   	if (cpu < 0)
> @@ -9623,6 +9626,27 @@ static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_
>   }
>   
>   #ifdef CONFIG_SCHED_CACHE
> +/*
> + * The margin used when comparing LLC utilization with CPU capacity.
> + * Parameter llc_overload_pct determines the LLC load level where
> + * active LLC aggregation is done.
> + * Derived from fits_capacity().
> + *
> + * (default: ~50%)
> + */
> +#define fits_llc_capacity(util, max)	\
> +	((util) * 100 < (max) * llc_overload_pct)
> +
> +/*
> + * The margin used when comparing utilization.
> + * is 'util1' noticeably greater than 'util2'
> + * Derived from capacity_greater().
> + * Bias is in perentage.
> + */
> +/* Allows dst util to be bigger than src util by up to bias percent */
> +#define util_greater(util1, util2) \
> +	((util1) * 100 > (util2) * (100 + llc_imb_pct))
> +
>   /* Called from load balancing paths with rcu_read_lock held */
>   static __maybe_unused bool get_llc_stats(int cpu, unsigned long *util,
>   					 unsigned long *cap)
> @@ -9638,6 +9662,135 @@ static __maybe_unused bool get_llc_stats(int cpu, unsigned long *util,
>   
>   	return true;
>   }
> +
> +/*
> + * Decision matrix according to the LLC utilization. To
> + * decide whether we can do task aggregation across LLC.
> + *
> + * By default, 50% is the threshold to treat the LLC as busy,
> + * and 20% is the utilization imbalance percentage to decide
> + * if the preferred LLC is busier than the non-preferred LLC.
> + * The hysteresis is used to avoid task bouncing between the
> + * preferred LLC and the non-preferred LLC.
> + *
> + * 1. moving towards the preferred LLC, dst is the preferred
> + *    LLC, src is not.
> + *
> + * src \ dst      30%  40%  50%  60%
> + * 30%            Y    Y    Y    N
> + * 40%            Y    Y    Y    Y
> + * 50%            Y    Y    G    G
> + * 60%            Y    Y    G    G
> + *
> + * 2. moving out of the preferred LLC, src is the preferred
> + *    LLC, dst is not:
> + *
> + * src \ dst      30%  40%  50%  60%
> + * 30%            N    N    N    N
> + * 40%            N    N    N    N
> + * 50%            N    N    G    G
> + * 60%            Y    N    G    G
> + *
> + * src :      src_util
> + * dst :      dst_util
> + * Y :        Yes, migrate
> + * N :        No, do not migrate
> + * G :        let the Generic load balance to even the load.
> + *
> + * The intention is that if both LLCs are quite busy, cache aware
> + * load balance should not be performed, and generic load balance
> + * should take effect. However, if one is busy and the other is not,
> + * the preferred LLC capacity(50%) and imbalance criteria(20%) should
> + * be considered to determine whether LLC aggregation should be
> + * performed to bias the load towards the preferred LLC.
> + */
> +
> +/* migration decision, 3 states are orthogonal. */
> +enum llc_mig {
> +	mig_forbid = 0,		/* N: Don't migrate task, respect LLC preference */
> +	mig_llc,		/* Y: Do LLC preference based migration */
> +	mig_unrestricted	/* G: Don't restrict generic load balance migration */
> +};
> +
> +/*
> + * Check if task can be moved from the source LLC to the
> + * destination LLC without breaking cache aware preferrence.
> + * src_cpu and dst_cpu are arbitrary CPUs within the source
> + * and destination LLCs, respectively.
> + */
> +static enum llc_mig can_migrate_llc(int src_cpu, int dst_cpu,
> +				    unsigned long tsk_util,
> +				    bool to_pref)
> +{
> +	unsigned long src_util, dst_util, src_cap, dst_cap;
> +
> +	if (!get_llc_stats(src_cpu, &src_util, &src_cap) ||
> +	    !get_llc_stats(dst_cpu, &dst_util, &dst_cap))
> +		return mig_unrestricted;
> +
> +	if (!fits_llc_capacity(dst_util, dst_cap) &&
> +	    !fits_llc_capacity(src_util, src_cap))
> +		return mig_unrestricted;
> +
> +	src_util = src_util < tsk_util ? 0 : src_util - tsk_util;
> +	dst_util = dst_util + tsk_util;
> +	if (to_pref) {
> +		/*
> +		 * llc_imb_pct is the imbalance allowed between
> +		 * preferred LLC and non-preferred LLC.
> +		 * Don't migrate if we will get preferred LLC too
> +		 * heavily loaded and if the dest is much busier
> +		 * than the src, in which case migration will
> +		 * increase the imbalance too much.
> +		 */
> +		if (!fits_llc_capacity(dst_util, dst_cap) &&
> +		    util_greater(dst_util, src_util))
> +			return mig_forbid;
> +	} else {
> +		/*
> +		 * Don't migrate if we will leave preferred LLC
> +		 * too idle, or if this migration leads to the
> +		 * non-preferred LLC falls within sysctl_aggr_imb percent
> +		 * of preferred LLC, leading to migration again
> +		 * back to preferred LLC.
> +		 */
> +		if (fits_llc_capacity(src_util, src_cap) ||
> +		    !util_greater(src_util, dst_util))
> +			return mig_forbid;
> +	}
> +	return mig_llc;
> +}
> +
> +/*
> + * Check if task p can migrate from source LLC to
> + * destination LLC in terms of cache aware load balance.
> + */
> +static __maybe_unused enum llc_mig can_migrate_llc_task(int src_cpu, int dst_cpu,
> +							struct task_struct *p)
> +{
> +	struct mm_struct *mm;
> +	bool to_pref;
> +	int cpu;
> +
> +	mm = p->mm;
> +	if (!mm)
> +		return mig_unrestricted;
> +
> +	cpu = mm->mm_sched_cpu;
> +	if (cpu < 0 || cpus_share_cache(src_cpu, dst_cpu))
> +		return mig_unrestricted;
> +
> +	if (cpus_share_cache(dst_cpu, cpu))
> +		to_pref = true;
> +	else if (cpus_share_cache(src_cpu, cpu))
> +		to_pref = false;
> +	else
> +		return mig_unrestricted;
> +
> +	return can_migrate_llc(src_cpu, dst_cpu,
> +			       task_util(p), to_pref);
> +}
> +
>   #else
>   static inline bool get_llc_stats(int cpu, unsigned long *util,
>   				 unsigned long *cap)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 84118b522f22..bf72c5bab506 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2828,6 +2828,11 @@ extern unsigned int sysctl_numa_balancing_scan_period_max;
>   extern unsigned int sysctl_numa_balancing_scan_size;
>   extern unsigned int sysctl_numa_balancing_hot_threshold;
>   
> +#ifdef CONFIG_SCHED_CACHE
> +extern unsigned int llc_overload_pct;
> +extern unsigned int llc_imb_pct;
> +#endif
> +
>   #ifdef CONFIG_SCHED_HRTICK
>   
>   /*