linux-kernel - Re: [RFC] Documentation/scheduler/schedutil.txt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201120091356.GA2653684@google.com>
Date:   Fri, 20 Nov 2020 09:13:56 +0000
From:   Quentin Perret <qperret@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        dietmar.eggemann@....com, patrick.bellasi@...bug.net,
        lenb@...nel.org, linux-kernel@...r.kernel.org,
        valentin.schneider@....com, ionela.voinescu@....com,
        viresh.kumar@...aro.org
Subject: Re: [RFC] Documentation/scheduler/schedutil.txt

On Friday 20 Nov 2020 at 09:56:53 (+0100), Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 08:55:27AM +0100, Peter Zijlstra wrote:
> >  - In saturated scenarios task movement will cause some transient dips,
> >    suppose we have a CPU saturated with 4 tasks, then when we migrate a task
> >    to an idle CPU, the old CPU will have a 'running' value of 0.75 while the
> >    new CPU will gain 0.25. This is inevitable and time progression will
> >    correct this. XXX do we still guarantee f_max due to no idle-time?

The sugov_cpu_is_busy() logic should mitigate that, but looking at it
again I just realized we don't apply it to the 'shared' update path. I
can't recall why. Anybody?

> Do we want something like this? Is the 1.5 threshold sane? (it's been too
> long since I looked at actual numbers here)
> 
> ---
> 
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 68d369cba9e4..f0bed8902c40 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -90,3 +90,4 @@ SCHED_FEAT(WA_BIAS, true)
>   */
>  SCHED_FEAT(UTIL_EST, true)
>  SCHED_FEAT(UTIL_EST_FASTUP, true)
> +SCHED_FEAT(UTIL_SAT, true)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 590e6f27068c..bf70e5ed8ba6 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2593,10 +2593,17 @@ static inline unsigned long cpu_util_dl(struct rq *rq)
>  	return READ_ONCE(rq->avg_dl.util_avg);
>  }
>  
> +#define RUNNABLE_SAT (SCHED_CAPACITY_SCALE + SCHED_CAPACITY_SCALE/2)
> +
>  static inline unsigned long cpu_util_cfs(struct rq *rq)
>  {
>  	unsigned long util = READ_ONCE(rq->cfs.avg.util_avg);
>  
> +	if (sched_feat(UTIL_SAT)) {
> +		if (READ_ONCE(rq->cfs.avg.runnable_avg) > RUNNABLE_SAT)
> +			return SCHED_CAPACITY_SCALE;
> +	}
> +
>  	if (sched_feat(UTIL_EST)) {
>  		util = max_t(unsigned long, util,
>  			     READ_ONCE(rq->cfs.avg.util_est.enqueued));

Need to do the math again, but it's an interesting idea and would solve
a few things (e.g. reset the overutilized flag because of the 'gap' left
by a migration and such) ...

Thanks,
Quentin