linux-kernel - Re: [RFC PATCH 6/6 v8] sched/fair: Add EAS and idle cpu push trigger

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20260206183017.ovewgke6r4cvt5pf@airbuntu>
Date: Fri, 6 Feb 2026 18:30:17 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
	pierre.gondois@....com, kprateek.nayak@....com,
	hongyan.xia2@....com, christian.loehle@....com,
	luis.machado@....com
Subject: Re: [RFC PATCH 6/6 v8] sched/fair: Add EAS and idle cpu push trigger

On 12/02/25 19:12, Vincent Guittot wrote:
> EAS is based on wakeup events to efficiently place tasks on the system, but
> there are cases where a task doesn't have wakeup events anymore or at a far
> too low pace. For such cases, we check if it's worth pushing the task on
> another CPUs instead of putting it back in the enqueued list.
> 
> Wake up events remain the main way to migrate tasks but we now detect
> situation where a task is stuck on a CPU by checking that its utilization
> is larger than the max available compute capacity (max cpu capacity or
> uclamp max setting).
> 
> When the system becomes overutilized and some CPUs are idle, we try to
> push tasks instead of waiting periodic load balance.

I am fine with these wording. But I think enable lb based on power is a very
good description too. Basically we don't have the concept on down migration for
HMP systems to help save power for tasks that are hinted are fine with running
at lower performance level via uclamp_max.

> 
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> ---
>  kernel/sched/fair.c     | 64 +++++++++++++++++++++++++++++++++++++++++
>  kernel/sched/topology.c |  2 ++
>  2 files changed, 66 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0c0c675f39cf..e9e1d0c05805 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8500,8 +8500,72 @@ static inline bool sched_push_task_enabled(void)
>  	return static_branch_unlikely(&sched_push_task);
>  }
>  
> +static inline bool task_stuck_on_cpu(struct task_struct *p, int cpu)
> +{
> +	unsigned long max_capa, util;
> +
> +	max_capa = min(get_actual_cpu_capacity(cpu),
> +		       uclamp_eff_value(p, UCLAMP_MAX));

I think we check if uclamp_max is == SCHED_CAPACITY_SCALE. By definition these
are not stuck. I found without this condition we can trigger this a lot
unnecessarily.

> +	util = max(task_util_est(p), task_runnable(p));

We must take the min(util, SCHED_CAPACITY_SCALE) here since runnable can get
too large making the condition above true even if you are on the biggest
capacity cpu.

> +
> +	/*
> +	 * Return true only if the task might not sleep/wakeup because of a low
> +	 * compute capacity. Tasks, which wake up regularly, will be handled by
> +	 * feec().
> +	 */
> +	return (util > max_capa);
> +}
> +
> +static inline bool sched_energy_push_task(struct task_struct *p, struct rq *rq)
> +{
> +	if (!sched_energy_enabled())
> +		return false;
> +
> +	if (is_rd_overutilized(rq->rd))
> +		return false;
> +
> +	if (task_stuck_on_cpu(p, cpu_of(rq)))
> +		return true;
> +
> +	if (!task_fits_cpu(p, cpu_of(rq)))
> +		return true;
> +
> +	return false;
> +}
> +
> +static inline bool sched_idle_push_task(struct task_struct *p, struct rq *rq)
> +{
> +	if (rq->nr_running == 1)
> +		return false;
> +
> +	if (!is_rd_overutilized(rq->rd))
> +		return false;
> +
> +	/* If there are idle cpus in the llc then try to push the task on it */
> +	if (test_idle_cores(cpu_of(rq)))
> +		return true;
> +
> +	return false;
> +}
> +
> +
>  static bool fair_push_task(struct rq *rq, struct task_struct *p)
>  {
> +	if (!task_on_rq_queued(p))
> +		return false;
> +
> +	if (p->se.sched_delayed)
> +		return false;
> +
> +	if (p->nr_cpus_allowed == 1)
> +		return false;
> +
> +	if (sched_energy_push_task(p, rq))
> +		return true;
> +
> +	if (sched_idle_push_task(p, rq))
> +		return true;

In my testing (of earlier version of the patch) I found adding a new
is_rq_overloaded(rq) test which simply checks if rq->nr_running > 1 is helpful
to make the whole regular lb required at all (get rid of overutilized). Still
testing it though, something to consider now or later. I don't mind.

> +
>  	return false;
>  }
>  
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index cf643a5ddedd..00abd01acb84 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -391,10 +391,12 @@ static void sched_energy_set(bool has_eas)
>  		if (sched_debug())
>  			pr_info("%s: stopping EAS\n", __func__);
>  		static_branch_disable_cpuslocked(&sched_energy_present);
> +		static_branch_dec_cpuslocked(&sched_push_task);
>  	} else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
>  		if (sched_debug())
>  			pr_info("%s: starting EAS\n", __func__);
>  		static_branch_enable_cpuslocked(&sched_energy_present);
> +		static_branch_inc_cpuslocked(&sched_push_task);
>  	}
>  }
>  
> -- 
> 2.43.0
>