[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBChhgqy7KaStP8JsF3eS4fw1SGNuSaJ0D-UO3WzqZqZA@mail.gmail.com>
Date: Mon, 9 Feb 2026 14:20:08 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
pierre.gondois@....com, kprateek.nayak@....com, hongyan.xia2@....com,
christian.loehle@....com, luis.machado@....com
Subject: Re: [RFC PATCH 6/6 v8] sched/fair: Add EAS and idle cpu push trigger
On Fri, 6 Feb 2026 at 19:30, Qais Yousef <qyousef@...alina.io> wrote:
>
> On 12/02/25 19:12, Vincent Guittot wrote:
> > EAS is based on wakeup events to efficiently place tasks on the system, but
> > there are cases where a task doesn't have wakeup events anymore or at a far
> > too low pace. For such cases, we check if it's worth pushing the task on
> > another CPUs instead of putting it back in the enqueued list.
> >
> > Wake up events remain the main way to migrate tasks but we now detect
> > situation where a task is stuck on a CPU by checking that its utilization
> > is larger than the max available compute capacity (max cpu capacity or
> > uclamp max setting).
> >
> > When the system becomes overutilized and some CPUs are idle, we try to
> > push tasks instead of waiting periodic load balance.
>
> I am fine with these wording. But I think enable lb based on power is a very
> good description too. Basically we don't have the concept on down migration for
> HMP systems to help save power for tasks that are hinted are fine with running
> at lower performance level via uclamp_max.
>
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > ---
> > kernel/sched/fair.c | 64 +++++++++++++++++++++++++++++++++++++++++
> > kernel/sched/topology.c | 2 ++
> > 2 files changed, 66 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0c0c675f39cf..e9e1d0c05805 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8500,8 +8500,72 @@ static inline bool sched_push_task_enabled(void)
> > return static_branch_unlikely(&sched_push_task);
> > }
> >
> > +static inline bool task_stuck_on_cpu(struct task_struct *p, int cpu)
> > +{
> > + unsigned long max_capa, util;
> > +
> > + max_capa = min(get_actual_cpu_capacity(cpu),
> > + uclamp_eff_value(p, UCLAMP_MAX));
>
> I think we check if uclamp_max is == SCHED_CAPACITY_SCALE. By definition these
> are not stuck. I found without this condition we can trigger this a lot
> unnecessarily.
okay , will check
>
> > + util = max(task_util_est(p), task_runnable(p));
>
> We must take the min(util, SCHED_CAPACITY_SCALE) here since runnable can get
> too large making the condition above true even if you are on the biggest
> capacity cpu.
hmm task_runnable should not go above SCHED_CAPACITY_SCALE. do you
have seen cases where task's runnable_avg goes above
SCHED_CAPACITY_SCALE ?
In fact neither task_util_est nor task_runnable should go above
SCHED_CAPACITY_SCALE
>
> > +
> > + /*
> > + * Return true only if the task might not sleep/wakeup because of a low
> > + * compute capacity. Tasks, which wake up regularly, will be handled by
> > + * feec().
> > + */
> > + return (util > max_capa);
> > +}
> > +
> > +static inline bool sched_energy_push_task(struct task_struct *p, struct rq *rq)
> > +{
> > + if (!sched_energy_enabled())
> > + return false;
> > +
> > + if (is_rd_overutilized(rq->rd))
> > + return false;
> > +
> > + if (task_stuck_on_cpu(p, cpu_of(rq)))
> > + return true;
> > +
> > + if (!task_fits_cpu(p, cpu_of(rq)))
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +static inline bool sched_idle_push_task(struct task_struct *p, struct rq *rq)
> > +{
> > + if (rq->nr_running == 1)
> > + return false;
> > +
> > + if (!is_rd_overutilized(rq->rd))
> > + return false;
> > +
> > + /* If there are idle cpus in the llc then try to push the task on it */
> > + if (test_idle_cores(cpu_of(rq)))
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +
> > static bool fair_push_task(struct rq *rq, struct task_struct *p)
> > {
> > + if (!task_on_rq_queued(p))
> > + return false;
> > +
> > + if (p->se.sched_delayed)
> > + return false;
> > +
> > + if (p->nr_cpus_allowed == 1)
> > + return false;
> > +
> > + if (sched_energy_push_task(p, rq))
> > + return true;
> > +
> > + if (sched_idle_push_task(p, rq))
> > + return true;
>
> In my testing (of earlier version of the patch) I found adding a new
> is_rq_overloaded(rq) test which simply checks if rq->nr_running > 1 is helpful
> to make the whole regular lb required at all (get rid of overutilized). Still
> testing it though, something to consider now or later. I don't mind.
I was conservative and didn't want to trigger push too often but it
might end up being better. I will check
>
> > +
> > return false;
> > }
> >
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index cf643a5ddedd..00abd01acb84 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -391,10 +391,12 @@ static void sched_energy_set(bool has_eas)
> > if (sched_debug())
> > pr_info("%s: stopping EAS\n", __func__);
> > static_branch_disable_cpuslocked(&sched_energy_present);
> > + static_branch_dec_cpuslocked(&sched_push_task);
> > } else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
> > if (sched_debug())
> > pr_info("%s: starting EAS\n", __func__);
> > static_branch_enable_cpuslocked(&sched_energy_present);
> > + static_branch_inc_cpuslocked(&sched_push_task);
> > }
> > }
> >
> > --
> > 2.43.0
> >
Powered by blists - more mailing lists