[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_A25A82E2DA498E4EB831310A21488D897C08@qq.com>
Date: Fri, 23 Jan 2026 02:13:46 +0800
From: Yangyu Chen <cyy@...self.name>
To: Tim Chen <tim.c.chen@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Vincent Guittot <vincent.guittot@...aro.org>
Cc: Chen Yu <yu.c.chen@...el.com>, Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
Hillf Danton <hdanton@...a.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>,
Jianyong Wu <jianyong.wu@...look.com>, Tingyin Duan
<tingyin.duan@...il.com>, Vern Hao <vernhao@...cent.com>,
Vern Hao <haoxing990@...il.com>, Len Brown <len.brown@...el.com>,
Aubrey Li <aubrey.li@...el.com>, Zhao Liu <zhao1.liu@...el.com>,
Chen Yu <yu.chen.surf@...il.com>, Adam Li <adamli@...amperecomputing.com>,
Aaron Lu <ziqianlu@...edance.com>, Tim Chen <tim.c.chen@...el.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 03/23] sched/cache: Introduce helper functions to
enforce LLC migration policy
On 4/12/2025 07:07, Tim Chen wrote:
> From: Chen Yu <yu.c.chen@...el.com>
>
> Cache-aware scheduling aggregates threads onto their preferred LLC,
> mainly through load balancing. When the preferred LLC becomes
> saturated, more threads are still placed there, increasing latency.
> A mechanism is needed to limit aggregation so that the preferred LLC
> does not become overloaded.
>
> Introduce helper functions can_migrate_llc() and
> can_migrate_llc_task() to enforce the LLC migration policy:
>
> 1. Aggregate a task to its preferred LLC if both source and
> destination LLCs are not too busy (<50% utilization),
Hi Chen Yu and Tim Chen,
I would like to ask why here chooses 50% for LLC busy. For example, a
common AMD Zen 3-5 has 8 cores per LLC. When these servers turned SMT
off, an 8-thread process can not be scheduled within 1 LLC. I think here
can be 100% by default.
Thanks,
Yangyu Chen
> or if doing so will not leave the preferred LLC much more
> imbalanced than the non-preferred one (>20% utilization
> difference, similar to imbalance_pct of the LLC domain).
> 2. Allow moving a task from overloaded preferred LLC to a non preferred
> LLC if this will not cause the non preferred LLC to become
> too imbalanced to cause a later migration back.
> 3. If both LLCs are too busy, let the generic load balance to spread
> the tasks.
>
> Further (hysteresis)action could be taken in the future to prevent tasks
> from being migrated into and out of the preferred LLC frequently (back and
> forth): the threshold for migrating a task out of its preferred LLC should
> be higher than that for migrating it into the LLC.
>
> Since aggregation tends to make the preferred LLC busier than others,
> the imbalance tolerance is controlled by llc_imb_pct. If set to 0,
> tasks may still aggregate to the preferred LLC as long as it is
> not more utilized than the source LLC, preserving the preference.
>
> Co-developed-by: Tim Chen <tim.c.chen@...ux.intel.com>
> Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> ---
>
> Notes:
> v1->v2:
> No change.
>
> kernel/sched/fair.c | 153 +++++++++++++++++++++++++++++++++++++++++++
> kernel/sched/sched.h | 5 ++
> 2 files changed, 158 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b9f336300f14..710ed9943d27 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1205,6 +1205,9 @@ static s64 update_se(struct rq *rq, struct sched_entity *se)
> #define EPOCH_PERIOD (HZ / 100) /* 10 ms */
> #define EPOCH_LLC_AFFINITY_TIMEOUT 5 /* 50 ms */
>
> +__read_mostly unsigned int llc_overload_pct = 50;
> +__read_mostly unsigned int llc_imb_pct = 20;
> +
> static int llc_id(int cpu)
> {
> if (cpu < 0)
> @@ -9623,6 +9626,27 @@ static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_
> }
>
> #ifdef CONFIG_SCHED_CACHE
> +/*
> + * The margin used when comparing LLC utilization with CPU capacity.
> + * Parameter llc_overload_pct determines the LLC load level where
> + * active LLC aggregation is done.
> + * Derived from fits_capacity().
> + *
> + * (default: ~50%)
> + */
> +#define fits_llc_capacity(util, max) \
> + ((util) * 100 < (max) * llc_overload_pct)
> +
> +/*
> + * The margin used when comparing utilization.
> + * is 'util1' noticeably greater than 'util2'
> + * Derived from capacity_greater().
> + * Bias is in perentage.
> + */
> +/* Allows dst util to be bigger than src util by up to bias percent */
> +#define util_greater(util1, util2) \
> + ((util1) * 100 > (util2) * (100 + llc_imb_pct))
> +
> /* Called from load balancing paths with rcu_read_lock held */
> static __maybe_unused bool get_llc_stats(int cpu, unsigned long *util,
> unsigned long *cap)
> @@ -9638,6 +9662,135 @@ static __maybe_unused bool get_llc_stats(int cpu, unsigned long *util,
>
> return true;
> }
> +
> +/*
> + * Decision matrix according to the LLC utilization. To
> + * decide whether we can do task aggregation across LLC.
> + *
> + * By default, 50% is the threshold to treat the LLC as busy,
> + * and 20% is the utilization imbalance percentage to decide
> + * if the preferred LLC is busier than the non-preferred LLC.
> + * The hysteresis is used to avoid task bouncing between the
> + * preferred LLC and the non-preferred LLC.
> + *
> + * 1. moving towards the preferred LLC, dst is the preferred
> + * LLC, src is not.
> + *
> + * src \ dst 30% 40% 50% 60%
> + * 30% Y Y Y N
> + * 40% Y Y Y Y
> + * 50% Y Y G G
> + * 60% Y Y G G
> + *
> + * 2. moving out of the preferred LLC, src is the preferred
> + * LLC, dst is not:
> + *
> + * src \ dst 30% 40% 50% 60%
> + * 30% N N N N
> + * 40% N N N N
> + * 50% N N G G
> + * 60% Y N G G
> + *
> + * src : src_util
> + * dst : dst_util
> + * Y : Yes, migrate
> + * N : No, do not migrate
> + * G : let the Generic load balance to even the load.
> + *
> + * The intention is that if both LLCs are quite busy, cache aware
> + * load balance should not be performed, and generic load balance
> + * should take effect. However, if one is busy and the other is not,
> + * the preferred LLC capacity(50%) and imbalance criteria(20%) should
> + * be considered to determine whether LLC aggregation should be
> + * performed to bias the load towards the preferred LLC.
> + */
> +
> +/* migration decision, 3 states are orthogonal. */
> +enum llc_mig {
> + mig_forbid = 0, /* N: Don't migrate task, respect LLC preference */
> + mig_llc, /* Y: Do LLC preference based migration */
> + mig_unrestricted /* G: Don't restrict generic load balance migration */
> +};
> +
> +/*
> + * Check if task can be moved from the source LLC to the
> + * destination LLC without breaking cache aware preferrence.
> + * src_cpu and dst_cpu are arbitrary CPUs within the source
> + * and destination LLCs, respectively.
> + */
> +static enum llc_mig can_migrate_llc(int src_cpu, int dst_cpu,
> + unsigned long tsk_util,
> + bool to_pref)
> +{
> + unsigned long src_util, dst_util, src_cap, dst_cap;
> +
> + if (!get_llc_stats(src_cpu, &src_util, &src_cap) ||
> + !get_llc_stats(dst_cpu, &dst_util, &dst_cap))
> + return mig_unrestricted;
> +
> + if (!fits_llc_capacity(dst_util, dst_cap) &&
> + !fits_llc_capacity(src_util, src_cap))
> + return mig_unrestricted;
> +
> + src_util = src_util < tsk_util ? 0 : src_util - tsk_util;
> + dst_util = dst_util + tsk_util;
> + if (to_pref) {
> + /*
> + * llc_imb_pct is the imbalance allowed between
> + * preferred LLC and non-preferred LLC.
> + * Don't migrate if we will get preferred LLC too
> + * heavily loaded and if the dest is much busier
> + * than the src, in which case migration will
> + * increase the imbalance too much.
> + */
> + if (!fits_llc_capacity(dst_util, dst_cap) &&
> + util_greater(dst_util, src_util))
> + return mig_forbid;
> + } else {
> + /*
> + * Don't migrate if we will leave preferred LLC
> + * too idle, or if this migration leads to the
> + * non-preferred LLC falls within sysctl_aggr_imb percent
> + * of preferred LLC, leading to migration again
> + * back to preferred LLC.
> + */
> + if (fits_llc_capacity(src_util, src_cap) ||
> + !util_greater(src_util, dst_util))
> + return mig_forbid;
> + }
> + return mig_llc;
> +}
> +
> +/*
> + * Check if task p can migrate from source LLC to
> + * destination LLC in terms of cache aware load balance.
> + */
> +static __maybe_unused enum llc_mig can_migrate_llc_task(int src_cpu, int dst_cpu,
> + struct task_struct *p)
> +{
> + struct mm_struct *mm;
> + bool to_pref;
> + int cpu;
> +
> + mm = p->mm;
> + if (!mm)
> + return mig_unrestricted;
> +
> + cpu = mm->mm_sched_cpu;
> + if (cpu < 0 || cpus_share_cache(src_cpu, dst_cpu))
> + return mig_unrestricted;
> +
> + if (cpus_share_cache(dst_cpu, cpu))
> + to_pref = true;
> + else if (cpus_share_cache(src_cpu, cpu))
> + to_pref = false;
> + else
> + return mig_unrestricted;
> +
> + return can_migrate_llc(src_cpu, dst_cpu,
> + task_util(p), to_pref);
> +}
> +
> #else
> static inline bool get_llc_stats(int cpu, unsigned long *util,
> unsigned long *cap)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 84118b522f22..bf72c5bab506 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2828,6 +2828,11 @@ extern unsigned int sysctl_numa_balancing_scan_period_max;
> extern unsigned int sysctl_numa_balancing_scan_size;
> extern unsigned int sysctl_numa_balancing_hot_threshold;
>
> +#ifdef CONFIG_SCHED_CACHE
> +extern unsigned int llc_overload_pct;
> +extern unsigned int llc_imb_pct;
> +#endif
> +
> #ifdef CONFIG_SCHED_HRTICK
>
> /*
Powered by blists - more mailing lists