linux-kernel - Re: [PATCH] sched/fair: Fix fixed point arithmetic width for shares and effective load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPM31RJ_kcVs1JRyq8CR08JkJbSP0JGDpr_=_PXwHTPfEWuhug@mail.gmail.com>
Date:   Tue, 23 Aug 2016 13:40:01 -0700
From:   Paul Turner <pjt@...nel.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/fair: Fix fixed point arithmetic width for shares
 and effective load

On Mon, Aug 22, 2016 at 7:00 AM, Dietmar Eggemann
<dietmar.eggemann@....com> wrote:
>
> Since commit 2159197d6677 ("sched/core: Enable increased load resolution
> on 64-bit kernels") we now have two different fixed point units for
> load.
> shares in calc_cfs_shares() has 20 bit fixed point unit on 64-bit
> kernels. Therefore use scale_load() on MIN_SHARES.
> wl in effective_load() has 10 bit fixed point unit. Therefore use
> scale_load_down() on tg->shares which has 20 bit fixed point unit on
> 64-bit kernels.
>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@....com>
> ---
>
> Besides the issue with load_above_capacity when it comes to different
> fixed point units for load "[PATCH] sched/fair: Fix load_above_capacity
> fixed point arithmetic width" there are similar issues for shares in
> calc_cfs_shares() as well as wl in effective_load().
>
>  kernel/sched/fair.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 61d485421bed..18f80c4c7737 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2530,8 +2530,8 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
>         if (tg_weight)
>                 shares /= tg_weight;
>
> -       if (shares < MIN_SHARES)
> -               shares = MIN_SHARES;
> +       if (shares < scale_load(MIN_SHARES))
> +               shares = scale_load(MIN_SHARES);
>         if (shares > tg->shares)
>                 shares = tg->shares;


MIN_SHARES is never scaled; it is an independent floor on the value
that can be assigned as a weight, so we never need to scale it down
(this would actually allow the weight to drop to zero which would be
bad).

>
>
> @@ -5023,9 +5023,9 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
>                  * wl = S * s'_i; see (2)
>                  */
>                 if (W > 0 && w < W)
> -                       wl = (w * (long)tg->shares) / W;
> +                       wl = (w * (long)scale_load_down(tg->shares)) / W;
>                 else
> -                       wl = tg->shares;
> +                       wl = scale_load_down(tg->shares);


This part looks good, effective_load() works with
source_load/target_load values, which originate in unscaled values.

Independent of this patch:
  When we initially introduced load scaling, it was ~uniform on every
value.  Most of the current pain has come from, and will continue to
come from, that with v2 of the load-tracking this is no longer the
case.  We have a massive number of scaled and unscaled inputs floating
around, many of them derived values (e.g. source_load above) which
require chasing.

I propose we simplify this.  Load scaling is only here so that the
load-balancer can make sane decisions.  We only need
cfs_rq->load.weight values to be scaled.

This means:
  (1) scaling in calculating group se->se.weight (calc_cfs_shares)
  (2) probable scaling in h_load calculations
  (2) if you have a calculation involving a cfs_rq->load.weight value,
you may need to think about scaling.
       [There's a bunch of obvious places this covers, most of them
are the load-balancer.  There's some indirects, e.g. the removed need
to scale in calculating vruntime, but these are easy to audit just by
searching for existing calls to scale.]

Probably missed one, but the fact that this list can be written means
its 1000 pages shorter than today's requirements.

The fact that (3) becomes the only rule to remember for the most part
makes reasoning about all of this stuff possible again because right
now it sucks.