linux-kernel - Re: [PATCH v2] sched/cfs: make util/load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <afe5d97e-368e-84aa-5693-5c02d55836a2@arm.com>
Date:   Tue, 25 Apr 2017 12:05:16 +0100
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>, peterz@...radead.org,
        mingo@...nel.org, linux-kernel@...r.kernel.org
Cc:     Morten.Rasmussen@....com, yuyang.du@...el.com, pjt@...gle.com,
        bsegall@...gle.com
Subject: Re: [PATCH v2] sched/cfs: make util/load_avg more stable

On 19/04/17 17:54, Vincent Guittot wrote:
> In the current implementation of load/util_avg, we assume that the ongoing
> time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX,
> even if part of the time segment still remains to run. As a consequence, this
> remaining part is considered as idle time and generates unexpected variations
> of util_avg of a busy CPU in the range ]1002..1024[ whereas util_avg should

Why do you use the square brackets the other way around? Just curious.

1002 stands for 1024*y^1 w/ y = 4008/4096 or y^32 = 0.5, right ? Might
be worth mentioning.

> stay at 1023.
> 
> In order to keep the metric stable, we should not consider the ongoing time
> segment when computing load/util_avg but only the segments that have already
> fully elapsed. Bu to not consider the current time segment adds unwanted
> latency in the load/util_avg responsivness especially when the time is scaled
> instead of the contribution. Instead of waiting for the current time segment
> to have fully elapsed before accounting it in load/util_avg, we can already
> account the elapsed part but change the range used to compute load/util_avg
> accordingly.
> 
> At the very beginning of a new time segment, the past segments have been
> decayed and the max value is MAX_LOAD_AVG*y. At the very end of the current
> time segment, the max value becomes 1024(us) + MAX_LOAD_AVG*y which is equal
> to MAX_LOAD_AVG. In fact, the max value is
> sa->period_contrib + MAX_LOAD_AVG*y at any time in the time segment.
> 
> Taking advantage of the fact that MAX_LOAD_AVG*y == MAX_LOAD_AVG-1024, the
> range becomes [0..MAX_LOAD_AVG-1024+sa->period_contrib].
> 
> As the elapsed part is already accounted in load/util_sum, we update the max
> value according to the current position in the time segment instead of
> removing its contribution.

Removing its contribution stands for '- 1024' of 'LOAD_AVG_MAX - 1024'
which was added in patch 1/2?

> 
> Suggested-by: Peter Zijlstra <peterz@...radead.org>
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> ---
> 
> Fold both patches in one
> 
>  kernel/sched/fair.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3f83a35..c3b8f0f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3017,12 +3017,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>  	/*
>  	 * Step 2: update *_avg.
>  	 */
> -	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
> +	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>  	if (cfs_rq) {
>  		cfs_rq->runnable_load_avg =
> -			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
> +			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
>  	}
> -	sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
> +	sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
>  
>  	return 1;
>  }
>