lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 19 Dec 2018 14:39:58 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Morten Rasmussen <Morten.Rasmussen@....com>
Subject: Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing

On Wed, 19 Dec 2018 at 12:58, Valentin Schneider
<valentin.schneider@....com> wrote:
>
> On 19/12/2018 08:32, Vincent Guittot wrote:
> [...]
> > This is another UC, asym packing is used at SMT level for now and we
> > don't face this kind of problem, it has been also tested and DynamiQ
> > configuration which is similar to SMT : 1 CPU per sched_group
> > The legacy b.L one was not the main target although it can be
> >
> > The rounding error is there and should be fixed and your UC is another
> > case for legacy b.L that can be treated in another patch
> >
>
> I used that setup out of convenience for myself, but AFAICT that use-case
> just stresses that issue.

After looking at you UC in details, your problem comes from the wl=1
for cpu0 whereas there is no running task.
But wl!=0 without running task should not be possible since fdf5f3
("wsched/fair: Disable LB_BIAS by default")

This explain the 1023 instead of 1022

>
> In the end we still have an imbalance value that can vary with only
> slight group_capacity changes. And that's true even if that group
> contains a single CPU, as the imbalance value computed by
> check_asym_packing() can vary by +/-1 with very tiny capacity changes
> (rt/IRQ pressure). That means that sometimes you're off by one and don't
> do the packing because some CPU was pressured, but some other time you
> might do it because that CPU was *more* pressured. It's doesn't make sense.
>
>
> In the end problem is that in update_sg_lb_stats() we do:
>
>         sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity;
>
> and in check_asym_packing() we do:
>
>         env->imbalance = DIV_ROUND_CLOSEST(
>                 sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity,
>                 SCHED_CAPACITY_SCALE)
>
> So we end up with something like:
>
>                     group_load * SCHED_CAPACITY_SCALE * group_capacity
>         imbalance = --------------------------------------------------
>                             group_capacity * SCHED_CAPACITY_SCALE
>
> Which you'd hope to reduce down to:
>
>         imbalance = group_load
>
> but you get division errors all over the place. So actually, the fix we'd
> want would be:
>
> -----8<-----
> @@ -8463,9 +8463,7 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
>         if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
>                 return 0;
>
> -       env->imbalance = DIV_ROUND_CLOSEST(
> -               sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity,
> -               SCHED_CAPACITY_SCALE);
> +       env->imbalance = sds->busiest_stat.group_load;
>
>         return 1;
>  }
> ----->8-----

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ