[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d02f00b0-75ca-483d-85c9-82269cf70072@amd.com>
Date: Fri, 9 Jan 2026 16:06:45 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Valentin Schneider <vschneid@...hat.com>, Huang Shijie
<shijie8@...il.com>, <mingo@...hat.com>, <peterz@...radead.org>,
<vincent.guittot@...aro.org>
CC: <dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
<mgorman@...e.de>, <linux-kernel@...r.kernel.org>, <vineethr@...ux.ibm.com>,
<cl@...ux.com>
Subject: Re: [PATCH v7 1/1] sched: update the rq->avg_idle when a task is
moved to an idle CPU
Hello Valentin,
On 1/9/2026 2:42 PM, Valentin Schneider wrote:
> On 26/12/25 14:32, Huang Shijie wrote:
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -3609,6 +3609,21 @@ static inline void ttwu_do_wakeup(struct task_struct *p)
>> trace_sched_wakeup(p);
>> }
>>
>> +void update_rq_avg_idle(struct rq *rq)
>> +{
>> + if (rq->idle_stamp) {
>> + u64 delta = rq_clock(rq) - rq->idle_stamp;
>> + u64 max = 2*rq->max_idle_balance_cost;
>> +
>> + update_avg(&rq->avg_idle, delta);
>> +
>> + if (rq->avg_idle > max)
>> + rq->avg_idle = max;
>> +
>> + rq->idle_stamp = 0;
>> + }
>> +}
>> +
>
> So if we have this invoked every time we switch to the idle task via
> put_prev_task_idle(), do we want to move sched_balance_newidle()'s update
> of rq->idle_stamp() to set_next_task_idle()?
> > That does change the behaviour as we'd now record any idle duration as
> opposed to only idle-from-fair duration, but that would mean we'd
> unconditionally record a rq->idle_stamp and could thus ditch the if{} clause.
So I'm a wee bit skeptical of this - the avg_idle also serves as a
bailout for newidle_balance(). If a tasks keeps waking up during newidle
balance, we would like to discourage further attempts of newidle balance
for a while to avoid CPU being stuck doing newidle balance while having
runnable tasks waken up on it.
There is no bailout past should_we_balance(), and for large domains, it
can take a while to get out of balancing.
If we move this to {put_prev,set_next}_task_idle(), we'll completely
fail to capture that part of newidle balance bailout and I'm afraid
we'll start doing newidle balance more aggressively.
I'll get some data over the weekend for the different variants being
discussed here - if it doesn't reveal anything drastic, we can
consider moving this accounting to idle task's switch.
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists