linux-kernel - Re: [PATCH] sched/fair: handle case of task_h

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtCxki8E=9DqistC32xZJ4ozufb9jUOR=ro34BNNNJtJiw@mail.gmail.com>
Date:   Thu, 2 Jul 2020 18:28:45 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/fair: handle case of task_h_load() returning 0

On Thu, 2 Jul 2020 at 18:11, Valentin Schneider
<valentin.schneider@....com> wrote:
>
>
> On 02/07/20 15:42, Vincent Guittot wrote:
> > task_h_load() can return 0 in some situations like running stress-ng
> > mmapfork, which forks thousands of threads, in a sched group on a 224 cores
> > system. The load balance doesn't handle this correctly because
> > env->imbalance never decreases and it will stop pulling tasks only after
> > reaching loop_max, which can be equal to the number of running tasks of
> > the cfs. Make sure that imbalance will be decreased by at least 1.
> >
> > misfit task is the other feature that doesn't handle correctly such
> > situation although it's probably more difficult to face the problem
> > because of the smaller number of CPUs and running tasks on heterogenous
> > system.
> >
> > We can't simply ensure that task_h_load() returns at least one because it
> > would imply to handle underrun in other places.
>
> Nasty one, that...
>
> Random thought: isn't that the kind of thing we have scale_load() and
> scale_load_down() for? There's more uses of task_h_load() than I would like
> for this, but if we upscale its output (or introduce an upscaled variant),
> we could do something like:
>
> ---
> detach_tasks()
> {
>         long imbalance = env->imbalance;
>
>         if (env->migration_type == migrate_load)
>                 imbalance = scale_load(imbalance);
>
>         while (!list_empty(tasks)) {
>                 /* ... */
>                 switch (env->migration_type) {
>                 case migrate_load:
>                         load = task_h_load_upscaled(p);
>                         /* ... usual bits here ...*/
>                         lsub_positive(&env->imbalance, load);
>                         break;
>                         /* ... */
>                 }
>
>                 if (!scale_load_down(env->imbalance))
>                         break;
>         }
> }
> ---
>
> It's not perfect, and there's still the misfit situation to sort out -
> still, do you think this is something we could go towards?

This will not work for 32bits system.

For 64bits, I have to think a bit more if the upscale would fix all
cases and support propagation across a hierarchy. And in this case we
could also consider to make scale_load/scale_load_down a nop all the
time

>
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > ---
> >  kernel/sched/fair.c | 18 +++++++++++++++++-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 6fab1d17c575..62747c24aa9e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4049,7 +4049,13 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
> >               return;
> >       }
> >
> > -     rq->misfit_task_load = task_h_load(p);
> > +     /*
> > +      * Make sure that misfit_task_load will not be null even if
> > +      * task_h_load() returns 0. misfit_task_load is only used to select
> > +      * rq with highest load so adding 1 will not modify the result
> > +      * of the comparison.
> > +      */
> > +     rq->misfit_task_load = task_h_load(p) + 1;
> >  }
> >
> >  #else /* CONFIG_SMP */
> > @@ -7664,6 +7670,16 @@ static int detach_tasks(struct lb_env *env)
> >                           env->sd->nr_balance_failed <= env->sd->cache_nice_tries)
> >                               goto next;
> >
> > +                     /*
> > +                      * Depending of the number of CPUs and tasks and the
> > +                      * cgroup hierarchy, task_h_load() can return a null
> > +                      * value. Make sure that env->imbalance decreases
> > +                      * otherwise detach_tasks() will stop only after
> > +                      * detaching up to loop_max tasks.
> > +                      */
> > +                     if (!load)
> > +                             load = 1;
> > +
> >                       env->imbalance -= load;
> >                       break;