[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtC+4=5M0RPjsOdi98KzkQYkfqxHsdK7cjk9m1RaFkhAjQ@mail.gmail.com>
Date: Fri, 9 Jan 2026 11:49:52 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Valentin Schneider <vschneid@...hat.com>
Cc: Huang Shijie <shijie8@...il.com>, mingo@...hat.com, peterz@...radead.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, linux-kernel@...r.kernel.org, vineethr@...ux.ibm.com,
kprateek.nayak@....com, cl@...ux.com
Subject: Re: [PATCH v7 1/1] sched: update the rq->avg_idle when a task is
moved to an idle CPU
On Fri, 9 Jan 2026 at 10:12, Valentin Schneider <vschneid@...hat.com> wrote:
>
> On 26/12/25 14:32, Huang Shijie wrote:
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3609,6 +3609,21 @@ static inline void ttwu_do_wakeup(struct task_struct *p)
> > trace_sched_wakeup(p);
> > }
> >
> > +void update_rq_avg_idle(struct rq *rq)
> > +{
> > + if (rq->idle_stamp) {
> > + u64 delta = rq_clock(rq) - rq->idle_stamp;
> > + u64 max = 2*rq->max_idle_balance_cost;
> > +
> > + update_avg(&rq->avg_idle, delta);
> > +
> > + if (rq->avg_idle > max)
> > + rq->avg_idle = max;
> > +
> > + rq->idle_stamp = 0;
> > + }
> > +}
> > +
>
> So if we have this invoked every time we switch to the idle task via
> put_prev_task_idle(), do we want to move sched_balance_newidle()'s update
> of rq->idle_stamp() to set_next_task_idle()?
I don't think that this is necessary. In worst case we will set
idle_stamp in sched_balance_newidle() but a sched_ext task will be
picked instead of going idle and the idle_stamp will not be used and
will be overwritten next time we try to pick next task
>
> That does change the behaviour as we'd now record any idle duration as
> opposed to only idle-from-fair duration, but that would mean we'd
> unconditionally record a rq->idle_stamp and could thus ditch the if{} clause.
yes the if test is probably not necessary anymore
>
> > static void
> > ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
> > struct rq_flags *rf)
> > @@ -3644,18 +3659,6 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
> > p->sched_class->task_woken(rq, p);
> > rq_repin_lock(rq, rf);
> > }
> > -
> > - if (rq->idle_stamp) {
> > - u64 delta = rq_clock(rq) - rq->idle_stamp;
> > - u64 max = 2*rq->max_idle_balance_cost;
> > -
> > - update_avg(&rq->avg_idle, delta);
> > -
> > - if (rq->avg_idle > max)
> > - rq->avg_idle = max;
> > -
> > - rq->idle_stamp = 0;
> > - }
> > }
> >
> > /*
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index 65eb8f8c1a5d..aba5ad53c07d 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -460,6 +460,7 @@ static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, struct t
> > {
> > update_curr_idle(rq);
> > scx_update_idle(rq, false, true);
> > + update_rq_avg_idle(rq);
>
> AFAICT we can't have put_prev_task_idle() immediately followed by
> set_next_task_idle(); put_prev_set_next_task() especially already handles
> this, so I think we're good, but maybe worth mentioning in the changelog?
>
> > }
> >
> > static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 3ceaa9dc9a9e..6e3dd8c975e0 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1651,6 +1651,7 @@ static inline struct cfs_rq *group_cfs_rq(struct sched_entity *grp)
> >
> > #endif /* !CONFIG_FAIR_GROUP_SCHED */
> >
> > +extern void update_rq_avg_idle(struct rq *rq);
> > extern void update_rq_clock(struct rq *rq);
> >
> > /*
> > --
> > 2.43.0
>
Powered by blists - more mailing lists