linux-kernel - Re: [PATCH v6] sched: Consolidate cpufreq updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240705002205.nnrgq7savzvsoqgl@airbuntu>
Date: Fri, 5 Jul 2024 01:22:05 +0100
From: Qais Yousef <qyousef@...alina.io>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Daniel Bristot de Oliveira <bristot@...hat.com>,
	Valentin Schneider <vschneid@...hat.com>,
	Christian Loehle <christian.loehle@....com>,
	Hongyan Xia <hongyan.xia2@....com>,
	John Stultz <jstultz@...gle.com>, linux-pm@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6] sched: Consolidate cpufreq updates

On 07/04/24 12:12, Dietmar Eggemann wrote:
> On 28/06/2024 03:52, Qais Yousef wrote:
> > On 06/25/24 14:58, Dietmar Eggemann wrote:
> > 
> >>> @@ -4917,6 +4927,84 @@ static inline void __balance_callbacks(struct rq *rq)
> >>>  
> >>>  #endif
> >>>  
> >>> +static __always_inline void
> >>> +__update_cpufreq_ctx_switch(struct rq *rq, struct task_struct *prev)
> >>> +{
> >>> +#ifdef CONFIG_CPU_FREQ
> >>> +	if (prev && prev->dl.flags & SCHED_FLAG_SUGOV) {
> >>> +		/* Sugov just did an update, don't be too aggressive */
> >>> +		return;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * RT and DL should always send a freq update. But we can do some
> >>> +	 * simple checks to avoid it when we know it's not necessary.
> >>> +	 *
> >>> +	 * iowait_boost will always trigger a freq update too.
> >>> +	 *
> >>> +	 * Fair tasks will only trigger an update if the root cfs_rq has
> >>> +	 * decayed.
> >>> +	 *
> >>> +	 * Everything else should do nothing.
> >>> +	 */
> >>> +	switch (current->policy) {
> >>> +	case SCHED_NORMAL:
> >>> +	case SCHED_BATCH:
> >>
> >> What about SCHED_IDLE tasks?
> > 
> > I didn't think they matter from cpufreq perspective. These tasks will just run
> > at whatever the idle system is happen to be at and have no specific perf
> > requirement since they should only run when the system is idle which a recipe
> > for starvation anyway?
> 
> Not sure we talk about the same thing here? idle_sched_class vs.
> SCHED_IDLE policy (FAIR task with a tiny weight of WEIGHT_IDLEPRIO).

Yes I am referring to SCHED_IDLE policy too. What is your expectation? AFAIK
the goal of this policy to run when there's nothing else needs running.

> 
> >>> +		if (unlikely(current->in_iowait)) {
> >>> +			cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT | SCHED_CPUFREQ_FORCE_UPDATE);
> >>> +			return;
> >>> +		}
> >>> +
> >>> +#ifdef CONFIG_SMP
> >>> +		if (unlikely(rq->cfs.decayed)) {
> >>> +			rq->cfs.decayed = false;
> >>> +			cpufreq_update_util(rq, 0);
> >>> +			return;
> >>> +		}
> >>> +#else
> >>> +		cpufreq_update_util(rq, 0);
> >>> +#endif
> >>
> >> We can have !CONFIG_SMP and CONFIG_FAIR_GROUP_SCHED systems. Does this
> >> mean on those systems we call cpufreq_update_util() for each cfs_rq of
> >> the hierarchy where on CONFIG_SMP we only do this for the root cfs_rq?
> > 
> > No. This is called on context switch only and hierarchy doesn't matter here. We
> > just do it unconditionally for UP since we only track the decayed at cfs_rq
> > level and I didn't think it's worth trying to make it at rq level.
> 
> OK, I see. The call in __update_cpufreq_ctx_switch() plus
> (task_tick_fair() and check_preempt_wakeup_fair()) are not related to a
> cfs_rq, but rather to the rq and/or task directly.
> 
> Currently we have the thing in update_load_avg() for !CONFIG_SMP and
> there we use cfs_rq_util_change() which only calls cpufreq_update_util()
> for root cfs_rq but this clearly has a cfs_rq context.
> 
> >> [...]
> >>
> >>> @@ -4744,8 +4716,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> >>>  	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
> >>>  		__update_load_avg_se(now, cfs_rq, se);
> >>>  
> >>> -	decayed  = update_cfs_rq_load_avg(now, cfs_rq);
> >>> -	decayed |= propagate_entity_load_avg(se);
> >>> +	cfs_rq->decayed |= update_cfs_rq_load_avg(now, cfs_rq);
> >>> +	cfs_rq->decayed |= propagate_entity_load_avg(se);
> >>>  
> >>>  	if (!se->avg.last_update_time && (flags & DO_ATTACH)) {
> >>>  
> >>> @@ -4766,11 +4738,8 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> >>>  		 */
> >>>  		detach_entity_load_avg(cfs_rq, se);
> >>>  		update_tg_load_avg(cfs_rq);
> >>> -	} else if (decayed) {
> >>> -		cfs_rq_util_change(cfs_rq, 0);
> >>> -
> >>> -		if (flags & UPDATE_TG)
> >>> -			update_tg_load_avg(cfs_rq);
> >>> +	} else if (cfs_rq->decayed && (flags & UPDATE_TG)) {
> >>> +		update_tg_load_avg(cfs_rq);
> >>>  	}
> >>>  }
> >>
> >> You set cfs_rq->decayed for each taskgroup level but you only reset it
> >> for the root cfs_rq in __update_cpufreq_ctx_switch() and task_tick_fair()?
> > 
> > Yes. We only care about using it for root level. Tracking the information at
> > cfs_rq level is the most natural way to do it as this is what update_load_avg()
> > is acting on.
> 
> But IMHO this creates an issue with those non-root cfs_rq's within

I am not seeing the issue, could you expand on what is it?

> update_load_avg() itself. They will stay decayed after cfs_rq->decayed
> has been set to 1 once and will never be reset to 0. So with UPDATE_TG
> update_tg_load_avg() will then always be called on those non-root
> cfs_rq's all the time.

We could add a check to update only the root cfs_rq. But what do we gain? Or
IOW, what is the harm of unconditionally updating cfs_rq->decayed given that we
only care about the root cfs_rq? I see more if conditions and branches which
I am trying to avoid.