[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ac29ecc0-13bc-4af4-b000-4846a40d9261@amperemail.onmicrosoft.com>
Date: Tue, 16 Dec 2025 17:49:00 +0800
From: Shijie Huang <shijie@...eremail.onmicrosoft.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Huang Shijie <shijie@...amperecomputing.com>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, patches@...erecomputing.com,
cl@...ux.com, Shubhang@...amperecomputing.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
linux-kernel@...r.kernel.org, vschneid@...hat.com, vineethr@...ux.ibm.com,
kprateek.nayak@....com
Subject: Re: [PATCH v6 2/2] sched: update the rq->avg_idle when a task is
moved to an idle CPU
On 16/12/2025 16:47, Vincent Guittot wrote:
> On Tue, 16 Dec 2025 at 08:39, Shijie Huang
> <shijie@...eremail.onmicrosoft.com> wrote:
>>
>> On 16/12/2025 15:17, Vincent Guittot wrote:
>>> On Tue, 16 Dec 2025 at 07:22, Shijie Huang
>>> <shijie@...eremail.onmicrosoft.com> wrote:
>>>> On 13/12/2025 09:36, Vincent Guittot wrote:
>>>>> put_prev_task_idle() would be a better place to call
>>>>> update_rq_avg_idle() because this is when we leave idle.
>>>> The update_rq_avg_idle() is not only called by current CPU, but also
>>>> called by
>>>>
>>>> other CPUs. For example, the try_to_wake_up(), update_rq_avg_idle() is
>>>> called by
>>>>
>>>> the other CPUs. So enqueue_task() is a good place.
>>> But put_prev_task_idle() is called by local CPU whenever it leaves
>>> idle so instead of trying to catch all places that could make the CPU
>>> leave idle it's better to use this single place.
>>> And as you mentioned, put_prev_task_idle is only called by local CPU
>>> whereas enqueue_task can be called by all CPUs creating useless
>>> pressure in the variable.
>> The rq->idle_stamp is set at sched_balance_newidle(). then we call
>> update_rq_avg_idle()
>>
>> in put_prev_task_idle() right now. How can we update the rq->avg_idle?
> I'm not sure I understand your point.
>
> rq->avg_idle tracks idle time. The easiest way would be to use
> - set_next_task_idle() when we enter idle
> - put_prev_task_idle() when we exit idle
>
> Except that sched_balance_newidle() can be long and the time should be
> accounted as idle time too. So instead of using set_next_task_idle(),
> we use sched_balance_newidle() to set . Which is okay because
> sched_balance_newidle() is always called before going to idle.
Thanks for the explanations.
It seems that put_prev_task_idle() is really a better place to call
update_rq_avg_idle(). Let me think it for a while :)
Thanks
Huang Shijie
>
Powered by blists - more mailing lists