linux-kernel - Re: [PATCH] fix scheduler regression from "sched/fair: Rework load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtC10+xn3EGz8agfPCK_xarTDSOPENqoGYJ3mvJCtMUeYw@mail.gmail.com>
Date:   Mon, 26 Oct 2020 16:54:07 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Rik van Riel <riel@...riel.com>
Cc:     Chris Mason <clm@...com>, Peter Zijlstra <peterz@...radead.org>,
        Johannes Weiner <hannes@...xchg.org>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fix scheduler regression from "sched/fair: Rework load_balance()"

On Mon, 26 Oct 2020 at 16:42, Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Mon, 26 Oct 2020 at 16:04, Rik van Riel <riel@...riel.com> wrote:
> >
> > On Mon, 2020-10-26 at 15:56 +0100, Vincent Guittot wrote:
> > > On Mon, 26 Oct 2020 at 15:38, Rik van Riel <riel@...riel.com> wrote:
> > > > On Mon, 2020-10-26 at 15:24 +0100, Vincent Guittot wrote:
> > > > > Le lundi 26 oct. 2020 à 08:45:27 (-0400), Chris Mason a écrit :
> > > > > > On 26 Oct 2020, at 4:39, Vincent Guittot wrote:
> > > > > >
> > > > > > > Hi Chris
> > > > > > >
> > > > > > > On Sat, 24 Oct 2020 at 01:49, Chris Mason <clm@...com> wrote:
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > We’re validating a new kernel in the fleet, and compared
> > > > > > > > with
> > > > > > > > v5.2,
> > > > > > >
> > > > > > > Which version are you using ?
> > > > > > > several improvements have been added since v5.5 and the
> > > > > > > rework of
> > > > > > > load_balance
> > > > > >
> > > > > > We’re validating v5.6, but all of the numbers referenced in
> > > > > > this
> > > > > > patch are
> > > > > > against v5.9.  I usually try to back port my way to victory on
> > > > > > this
> > > > > > kind of
> > > > > > thing, but mainline seems to behave exactly the same as
> > > > > > 0b0695f2b34a wrt
> > > > > > this benchmark.
> > > > >
> > > > > ok. Thanks for the confirmation
> > > > >
> > > > > I have been able to reproduce the problem on my setup.
> > > > >
> > > > > Could you try the fix below ?
> > > > >
> > > > > --- a/kernel/sched/fair.c
> > > > > +++ b/kernel/sched/fair.c
> > > > > @@ -9049,7 +9049,8 @@ static inline void
> > > > > calculate_imbalance(struct
> > > > > lb_env *env, struct sd_lb_stats *s
> > > > >          * emptying busiest.
> > > > >          */
> > > > >         if (local->group_type == group_has_spare) {
> > > > > -               if (busiest->group_type > group_fully_busy) {
> > > > > +               if ((busiest->group_type > group_fully_busy) &&
> > > > > +                   (busiest->group_weight > 1)) {
> > > > >                         /*
> > > > >                          * If busiest is overloaded, try to fill
> > > > > spare
> > > > >                          * capacity. This might end up creating
> > > > > spare
> > > > > capacity
> > > > >
> > > > >
> > > > > When we calculate an imbalance at te smallest level, ie between
> > > > > CPUs
> > > > > (group_weight == 1),
> > > > > we should try to spread tasks on cpus instead of trying to fill
> > > > > spare
> > > > > capacity.
> > > >
> > > > Should we also spread tasks when balancing between
> > > > multi-threaded CPU cores on the same socket?
> > >
> > > My explanation is probably misleading. In fact we already try to
> > > spread tasks. we just use spare capacity instead of nr_running when
> > > there is more than 1 CPU in the group and the group is overloaded.
> > > Using spare capacity is a bit more conservative because it tries to
> > > not pull more utilization than spare capacity
> >
> > Could utilization estimates be off, either lagging or
> > simply having a wrong estimate for a task, resulting
> > in no task getting pulled sometimes, while doing a
> > migrate_task imbalance always moves over something?
>
> task and cpu utilization are not always up to fully synced and may lag
> a bit which explains that sometimes LB can fail to migrate for a small
> diff

And also from util_est which reports the max utilization of the task
to be sure that LB migrates a task on a cpu that will have enough
available capacity

>
> >
> > Within an LLC we might not need to worry too much
> > about spare capacity, considering select_idle_sibling
> > doesn't give a hoot about capacity, either.
> >
> > --
> > All Rights Reversed.