lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCQTMs=6yGbrfuUGyhwPHWvMODcyyDROVjbURHFxLTxCQ@mail.gmail.com>
Date:   Thu, 20 Dec 2018 15:50:00 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Morten Rasmussen <Morten.Rasmussen@....com>
Subject: Re: [PATCH 3/3] sched/fair: fix unnecessary increase of balance interval

On Thu, 20 Dec 2018 at 12:22, Valentin Schneider
<valentin.schneider@....com> wrote:
>
> On 20/12/2018 07:55, Vincent Guittot wrote:
> > In case of active balance, we increase the balance interval to cover
> > pinned tasks cases not covered by all_pinned logic. Neverthless, the
> > active migration triggered by asym packing should be treated as the normal
> > unbalanced case and reset the interval to default value otherwise active
> > migration for asym_packing can be easily delayed for hundreds of ms
> > because of this pinned task detection mecanism.
> > The same happen to other conditions tested in need_active_balance() like
> > mistfit task and when the capacity of src_cpu is reduced compared to
> > dst_cpu (see comments in need_active_balance() for details).
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > ---
> >  kernel/sched/fair.c | 40 +++++++++++++++++++++++++++-------------
> >  1 file changed, 27 insertions(+), 13 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 487c73e..9b1e701 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8849,21 +8849,25 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >   */
> >  #define MAX_PINNED_INTERVAL  512
> >
> > -static int need_active_balance(struct lb_env *env)
> > +static inline bool
> > +asym_active_balance(struct lb_env *env)
> >  {
> > -     struct sched_domain *sd = env->sd;
> > +     /*
> > +      * ASYM_PACKING needs to force migrate tasks from busy but
> > +      * lower priority CPUs in order to pack all tasks in the
> > +      * highest priority CPUs.
> > +      */
> > +     return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
> > +            sched_asym_prefer(env->dst_cpu, env->src_cpu);
> > +}
> >
> > -     if (env->idle != CPU_NOT_IDLE) {
> > +static inline bool
> > +voluntary_active_balance(struct lb_env *env)
> > +{
> > +     struct sched_domain *sd = env->sd;
> >
> > -             /*
> > -              * ASYM_PACKING needs to force migrate tasks from busy but
> > -              * lower priority CPUs in order to pack all tasks in the
> > -              * highest priority CPUs.
> > -              */
> > -             if ((sd->flags & SD_ASYM_PACKING) &&
> > -                 sched_asym_prefer(env->dst_cpu, env->src_cpu))
> > -                     return 1;
> > -     }
> > +     if (asym_active_balance(env))
> > +             return 1;
> >
> >       /*
> >        * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
> > @@ -8881,6 +8885,16 @@ static int need_active_balance(struct lb_env *env)
> >       if (env->src_grp_type == group_misfit_task)
> >               return 1;
> >
> > +     return 0;
> > +}
> > +
>
> Yeah so that's the active balance classification I was afraid of, and I
> don't agree with it.
>
> The way I see things, we happen to have some mechanisms that let us know
> straight away if we need an active balance (asym packing, misfit, lowered
> capacity), and we rely on the sd->nr_balance_failed threshold for the
> scenarios where we don't have any more information.
>
> We do happen to have a threshold because we don't want to go overboard with
> it, but when it is reached it's a clear sign that we *do* want to active
> balance because that's all we can do to try and solve the imbalance.
>
> To me, those are all legitimate reasons to. So they're all "voluntary"
> really, we *do* want all of these.
>
> > +static int need_active_balance(struct lb_env *env)
> > +{
> > +     struct sched_domain *sd = env->sd;
> > +
> > +     if (voluntary_active_balance(env))
> > +             return 1;
> > +
> >       return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
> >  }
> >
> > @@ -9142,7 +9156,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> >       } else
> >               sd->nr_balance_failed = 0;
> >
> > -     if (likely(!active_balance)) {
> > +     if (likely(!active_balance) || voluntary_active_balance(&env)) {
>
> So now  we reset the interval for all active balances (expect last active
> balance case), even when it is done as a last resort because all other
> tasks were pinned.
>
> Arguably the current code isn't much better (always increase the interval
> when active balancing), but at least it covers this case. It would be a
> waste not to take this into account when we can detect this scenario

So i'd like to remind the $subject of this patchset: fix some known
issues for asym_packing.
While looking at this, we have added few other voluntary active
balances because it was "obvious" that this active migration were
voluntary one. But in fact, we don't have any UC which show a problem
for those additional UC so far.

The default behavior for active migration is to increase the interval
Now you want to extend the exception to others active migration UC
whereas it's not clear that we don't fall in the default active
migration UC and we should not increase the interval.

What you want is changing the behavior of the scheduler for UC that
haven't raised any problem where asym_packing has known problem/

Changing default behavior for active migration is not subject of this
patchset and should be treated in another one like the one discussed
with peter few days ago

> (I'll reiterate my LBF_ALL_PINNED suggestion).
>
> >               /* We were unbalanced, so reset the balancing interval */
> >               sd->balance_interval = sd->min_interval;
> >       } else {
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ