[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e988929-142c-4e69-8e2e-2f3e64c9f08c@arm.com>
Date: Fri, 5 Jul 2024 13:50:51 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Qais Yousef <qyousef@...alina.io>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>, Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Christian Loehle <christian.loehle@....com>,
Hongyan Xia <hongyan.xia2@....com>, John Stultz <jstultz@...gle.com>,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6] sched: Consolidate cpufreq updates
On 05/07/2024 02:22, Qais Yousef wrote:
> On 07/04/24 12:12, Dietmar Eggemann wrote:
>> On 28/06/2024 03:52, Qais Yousef wrote:
>>> On 06/25/24 14:58, Dietmar Eggemann wrote:
>>>
>>>>> @@ -4917,6 +4927,84 @@ static inline void __balance_callbacks(struct rq *rq)
>>>>>
>>>>> #endif
>>>>>
>>>>> +static __always_inline void
>>>>> +__update_cpufreq_ctx_switch(struct rq *rq, struct task_struct *prev)
>>>>> +{
>>>>> +#ifdef CONFIG_CPU_FREQ
>>>>> + if (prev && prev->dl.flags & SCHED_FLAG_SUGOV) {
>>>>> + /* Sugov just did an update, don't be too aggressive */
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * RT and DL should always send a freq update. But we can do some
>>>>> + * simple checks to avoid it when we know it's not necessary.
>>>>> + *
>>>>> + * iowait_boost will always trigger a freq update too.
>>>>> + *
>>>>> + * Fair tasks will only trigger an update if the root cfs_rq has
>>>>> + * decayed.
>>>>> + *
>>>>> + * Everything else should do nothing.
>>>>> + */
>>>>> + switch (current->policy) {
>>>>> + case SCHED_NORMAL:
>>>>> + case SCHED_BATCH:
>>>>
>>>> What about SCHED_IDLE tasks?
>>>
>>> I didn't think they matter from cpufreq perspective. These tasks will just run
>>> at whatever the idle system is happen to be at and have no specific perf
>>> requirement since they should only run when the system is idle which a recipe
>>> for starvation anyway?
>>
>> Not sure we talk about the same thing here? idle_sched_class vs.
>> SCHED_IDLE policy (FAIR task with a tiny weight of WEIGHT_IDLEPRIO).
>
> Yes I am referring to SCHED_IDLE policy too. What is your expectation? AFAIK
> the goal of this policy to run when there's nothing else needs running.
IMHO, SCHED_IDLE tasks fight with all the other FAIR task over the
resource rq. I would include SCHED_IDLE into this switch statement next
to SCHED_NORMAL and SCHED_BATCH.
What do you do if only SCHED_IDLE FAIR tasks are runnable? They probably
also want to have their CPU frequency needs adjusted.
[...]
>>>>> @@ -4766,11 +4738,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>>>>> */
>>>>> detach_entity_load_avg(cfs_rq, se);
>>>>> update_tg_load_avg(cfs_rq);
>>>>> - } else if (decayed) {
>>>>> - cfs_rq_util_change(cfs_rq, 0);
>>>>> -
>>>>> - if (flags & UPDATE_TG)
>>>>> - update_tg_load_avg(cfs_rq);
>>>>> + } else if (cfs_rq->decayed && (flags & UPDATE_TG)) {
>>>>> + update_tg_load_avg(cfs_rq);
>>>>> }
>>>>> }
>>>>
>>>> You set cfs_rq->decayed for each taskgroup level but you only reset it
>>>> for the root cfs_rq in __update_cpufreq_ctx_switch() and task_tick_fair()?
>>>
>>> Yes. We only care about using it for root level. Tracking the information at
>>> cfs_rq level is the most natural way to do it as this is what update_load_avg()
>>> is acting on.
>>
>> But IMHO this creates an issue with those non-root cfs_rq's within
>
> I am not seeing the issue, could you expand on what is it?
I tried to explained it in the 4 lines below. With a local 'decayed'
update_cfs_rq_load_avg() and propagate_entity_load_avg() set it every
time update_load_avg() gets called. And this then determines whether
update_tg_load_avg() is called on this cfs_rq later in update_load_avg().
The new code:
cfs_rq->decayed |= update_cfs_rq_load_avg() (*)
cfs_rq->decayed |= propagate_entity_load_avg()
will not reset 'cfs_rq->decayed' for non-root cfs_rq's.
(*) You changed this in v3 from:
cfs_rq->decayed = update_cfs_rq_load_avg()
>> update_load_avg() itself. They will stay decayed after cfs_rq->decayed
>> has been set to 1 once and will never be reset to 0. So with UPDATE_TG
>> update_tg_load_avg() will then always be called on those non-root
>> cfs_rq's all the time.
>
> We could add a check to update only the root cfs_rq. But what do we gain? Or
> IOW, what is the harm of unconditionally updating cfs_rq->decayed given that we
> only care about the root cfs_rq? I see more if conditions and branches which
> I am trying to avoid.
Yes, keep 'decayed' local and add a:
if (cfs_rq == &rq_of(cfs_rq)->cfs)
cfs_rq->decayed = decayed
Powered by blists - more mailing lists