linux-kernel - Re: [RFC PATCH 12/16] sched/pelt: Add new waiting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7806add6-8b3b-4dc6-b36c-4e7e23493a26@arm.com>
Date: Wed, 18 Sep 2024 09:01:38 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 "Rafael J. Wysocki" <rafael@...nel.org>,
 Viresh Kumar <viresh.kumar@...aro.org>
Cc: Juri Lelli <juri.lelli@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
 John Stultz <jstultz@...gle.com>, linux-pm@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 12/16] sched/pelt: Add new waiting_avg to record when
 runnable && !running

On 20/08/2024 18:35, Qais Yousef wrote:
> This info will be useful to understand how long tasks end up waiting
> behind other tasks. This info is recorded for tasks only, and
> added/subtracted from root cfs_rq on __update_load_avg_se().
> 
> It also helps to decouple util_avg which indicates tasks computational
> demand from the fact that the CPU might need to run faster to reduce the
> waiting time. It has been a point of confusion in the past while
> discussing uclamp and util_avg and the fact that not keeping freq high
> means tasks will take longer to run and cause delays. Isolating the
> source of delay into its own signal would be a better way to take this
> source of delay into account when making decisions independently of
> task's/CPU's computational demands.
> 
> It is not used now. But will be used later to help drive DVFS headroom.
> It could become a helpful metric to help us manage waiting latencies in
> general, for example in load balance.
> 
> TODO: waiting_avg should use rq_clock_task() as it doesn't care about
> invariance. Waiting time should reflect actual wait in realtime as this
> is the measure of latency that users care about.

Since you use PELT for the update, you're bound to use rq_clock_pelt().
If we could have PELT with two time values, then we could have
'util_avg' and 'invariant util_avg' to cure the slow ramp-up on tiny CPU
and/or low OPPs and we wouldn't have to add all of this extra code.

[...]

> @@ -4744,8 +4760,15 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>  	 * Track task load average for carrying it to new CPU after migrated, and
>  	 * track group sched_entity load average for task_h_load calculation in migration
>  	 */
> -	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
> +	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) {
> +		bool update_rq_waiting_avg = entity_is_task(se) && se_runnable(se);
> +
> +		if (update_rq_waiting_avg)
> +			sub_waiting_avg(&rq_of(cfs_rq)->cfs, se);
>  		__update_load_avg_se(now, cfs_rq, se);
> +		if (update_rq_waiting_avg)
> +			add_waiting_avg(&rq_of(cfs_rq)->cfs, se);
> +	}

That's a pretty convoluted design. util_est-style attach/detach within
the PELT update but only for tasks and not all se's.

Doesn't 'p->se.avg.runnable_avg - p->se.avg.util_avg' give you what you
want? It's invariant but so is this here.

Commit 50181c0cff31 ("sched/pelt: Avoid underestimation of task
utilization") uses some of it already.

+ /*
+  * To avoid underestimate of task utilization, skip updates of EWMA if
+  * we cannot grant that thread got all CPU time it wanted.
+  */
+  if ((ue.enqueued + UTIL_EST_MARGIN) < task_runnable(p))
+          goto done;

[...]

> @@ -6786,6 +6814,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  	 * estimated utilization, before we update schedutil.
>  	 */
>  	util_est_enqueue(&rq->cfs, p);
> +	add_waiting_avg(&rq->cfs, se);

This would also have to be checked against the new p->se.sched_delayed
thing.

>  	/*
>  	 * If in_iowait is set, the code below may not trigger any cpufreq
> @@ -6874,6 +6903,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  	bool was_sched_idle = sched_idle_rq(rq);
>  
>  	util_est_dequeue(&rq->cfs, p);
> +	sub_waiting_avg(&rq->cfs, se);
                                  ^^
This won't compile. se vs. &p->se

[...]