linux-kernel - Re: [PATCH v6] sched: Consolidate cpufreq updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2e988929-142c-4e69-8e2e-2f3e64c9f08c@arm.com>
Date: Fri, 5 Jul 2024 13:50:51 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Qais Yousef <qyousef@...alina.io>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
 Viresh Kumar <viresh.kumar@...aro.org>, Ingo Molnar <mingo@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Juri Lelli <juri.lelli@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
 Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
 Daniel Bristot de Oliveira <bristot@...hat.com>,
 Valentin Schneider <vschneid@...hat.com>,
 Christian Loehle <christian.loehle@....com>,
 Hongyan Xia <hongyan.xia2@....com>, John Stultz <jstultz@...gle.com>,
 linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6] sched: Consolidate cpufreq updates

On 05/07/2024 02:22, Qais Yousef wrote:
> On 07/04/24 12:12, Dietmar Eggemann wrote:
>> On 28/06/2024 03:52, Qais Yousef wrote:
>>> On 06/25/24 14:58, Dietmar Eggemann wrote:
>>>
>>>>> @@ -4917,6 +4927,84 @@ static inline void __balance_callbacks(struct rq *rq)
>>>>>  
>>>>>  #endif
>>>>>  
>>>>> +static __always_inline void
>>>>> +__update_cpufreq_ctx_switch(struct rq *rq, struct task_struct *prev)
>>>>> +{
>>>>> +#ifdef CONFIG_CPU_FREQ
>>>>> +	if (prev && prev->dl.flags & SCHED_FLAG_SUGOV) {
>>>>> +		/* Sugov just did an update, don't be too aggressive */
>>>>> +		return;
>>>>> +	}
>>>>> +
>>>>> +	/*
>>>>> +	 * RT and DL should always send a freq update. But we can do some
>>>>> +	 * simple checks to avoid it when we know it's not necessary.
>>>>> +	 *
>>>>> +	 * iowait_boost will always trigger a freq update too.
>>>>> +	 *
>>>>> +	 * Fair tasks will only trigger an update if the root cfs_rq has
>>>>> +	 * decayed.
>>>>> +	 *
>>>>> +	 * Everything else should do nothing.
>>>>> +	 */
>>>>> +	switch (current->policy) {
>>>>> +	case SCHED_NORMAL:
>>>>> +	case SCHED_BATCH:
>>>>
>>>> What about SCHED_IDLE tasks?
>>>
>>> I didn't think they matter from cpufreq perspective. These tasks will just run
>>> at whatever the idle system is happen to be at and have no specific perf
>>> requirement since they should only run when the system is idle which a recipe
>>> for starvation anyway?
>>
>> Not sure we talk about the same thing here? idle_sched_class vs.
>> SCHED_IDLE policy (FAIR task with a tiny weight of WEIGHT_IDLEPRIO).
> 
> Yes I am referring to SCHED_IDLE policy too. What is your expectation? AFAIK
> the goal of this policy to run when there's nothing else needs running.

IMHO, SCHED_IDLE tasks fight with all the other FAIR task over the
resource rq. I would include SCHED_IDLE into this switch statement next
to SCHED_NORMAL and SCHED_BATCH.
What do you do if only SCHED_IDLE FAIR tasks are runnable? They probably
also want to have their CPU frequency needs adjusted.

[...]

>>>>> @@ -4766,11 +4738,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>>>>>  		 */
>>>>>  		detach_entity_load_avg(cfs_rq, se);
>>>>>  		update_tg_load_avg(cfs_rq);
>>>>> -	} else if (decayed) {
>>>>> -		cfs_rq_util_change(cfs_rq, 0);
>>>>> -
>>>>> -		if (flags & UPDATE_TG)
>>>>> -			update_tg_load_avg(cfs_rq);
>>>>> +	} else if (cfs_rq->decayed && (flags & UPDATE_TG)) {
>>>>> +		update_tg_load_avg(cfs_rq);
>>>>>  	}
>>>>>  }
>>>>
>>>> You set cfs_rq->decayed for each taskgroup level but you only reset it
>>>> for the root cfs_rq in __update_cpufreq_ctx_switch() and task_tick_fair()?
>>>
>>> Yes. We only care about using it for root level. Tracking the information at
>>> cfs_rq level is the most natural way to do it as this is what update_load_avg()
>>> is acting on.
>>
>> But IMHO this creates an issue with those non-root cfs_rq's within
> 
> I am not seeing the issue, could you expand on what is it?

I tried to explained it in the 4 lines below. With a local 'decayed'
update_cfs_rq_load_avg() and propagate_entity_load_avg() set it every
time update_load_avg() gets called. And this then determines whether
update_tg_load_avg() is called on this cfs_rq later in update_load_avg().

The new code:

  cfs_rq->decayed |= update_cfs_rq_load_avg() (*)
  cfs_rq->decayed |= propagate_entity_load_avg()

will not reset 'cfs_rq->decayed' for non-root cfs_rq's.

(*) You changed this in v3 from:

  cfs_rq->decayed  = update_cfs_rq_load_avg()

>> update_load_avg() itself. They will stay decayed after cfs_rq->decayed
>> has been set to 1 once and will never be reset to 0. So with UPDATE_TG
>> update_tg_load_avg() will then always be called on those non-root
>> cfs_rq's all the time.
> 
> We could add a check to update only the root cfs_rq. But what do we gain? Or
> IOW, what is the harm of unconditionally updating cfs_rq->decayed given that we
> only care about the root cfs_rq? I see more if conditions and branches which
> I am trying to avoid.

Yes, keep 'decayed' local and add a:

    if (cfs_rq == &rq_of(cfs_rq)->cfs)
        cfs_rq->decayed = decayed