[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmhikdbndnq.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Fri, 09 Jan 2026 14:01:13 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: K Prateek Nayak <kprateek.nayak@....com>, Huang Shijie
<shijie8@...il.com>, mingo@...hat.com, peterz@...radead.org,
vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, linux-kernel@...r.kernel.org, vineethr@...ux.ibm.com,
cl@...ux.com
Subject: Re: [PATCH v7 1/1] sched: update the rq->avg_idle when a task is
moved to an idle CPU
On 09/01/26 16:06, K Prateek Nayak wrote:
> Hello Valentin,
>
> On 1/9/2026 2:42 PM, Valentin Schneider wrote:
>> On 26/12/25 14:32, Huang Shijie wrote:
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -3609,6 +3609,21 @@ static inline void ttwu_do_wakeup(struct task_struct *p)
>>> trace_sched_wakeup(p);
>>> }
>>>
>>> +void update_rq_avg_idle(struct rq *rq)
>>> +{
>>> + if (rq->idle_stamp) {
>>> + u64 delta = rq_clock(rq) - rq->idle_stamp;
>>> + u64 max = 2*rq->max_idle_balance_cost;
>>> +
>>> + update_avg(&rq->avg_idle, delta);
>>> +
>>> + if (rq->avg_idle > max)
>>> + rq->avg_idle = max;
>>> +
>>> + rq->idle_stamp = 0;
>>> + }
>>> +}
>>> +
>>
>> So if we have this invoked every time we switch to the idle task via
>> put_prev_task_idle(), do we want to move sched_balance_newidle()'s update
>> of rq->idle_stamp() to set_next_task_idle()?
>> > That does change the behaviour as we'd now record any idle duration as
>> opposed to only idle-from-fair duration, but that would mean we'd
>> unconditionally record a rq->idle_stamp and could thus ditch the if{} clause.
>
> So I'm a wee bit skeptical of this - the avg_idle also serves as a
> bailout for newidle_balance(). If a tasks keeps waking up during newidle
> balance, we would like to discourage further attempts of newidle balance
> for a while to avoid CPU being stuck doing newidle balance while having
> runnable tasks waken up on it.
>
> There is no bailout past should_we_balance(), and for large domains, it
> can take a while to get out of balancing.
>
> If we move this to {put_prev,set_next}_task_idle(), we'll completely
> fail to capture that part of newidle balance bailout and I'm afraid
> we'll start doing newidle balance more aggressively.
>
Ah you're right, I'd forgotten about this, there's even a comment above the
idle_stamp update pointing this out... Although AFAICT that means we'd end
up with smaller rq->avg_idle and thus would newidle_balance() less often;
regardless that is a behaviour change.
> I'll get some data over the weekend for the different variants being
> discussed here - if it doesn't reveal anything drastic, we can
> consider moving this accounting to idle task's switch.
>
> --
> Thanks and Regards,
> Prateek
Powered by blists - more mailing lists