linux-kernel - Re: [PATCH 2/4] sched/fair: reduce minimal imbalance threshold

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtC0Sf6SN+rqjQje-kXJiOaLzc0XnXqzS3hOF2FE=FscMA@mail.gmail.com>
Date:   Wed, 16 Sep 2020 08:53:32 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/4] sched/fair: reduce minimal imbalance threshold

On Tue, 15 Sep 2020 at 21:04, Valentin Schneider
<valentin.schneider@....com> wrote:
>
>
> On 14/09/20 11:03, Vincent Guittot wrote:
> > The 25% default imbalance threshold for DIE and NUMA domain is large
> > enough to generate significant unfairness between threads. A typical
> > example is the case of 11 threads running on 2x4 CPUs. The imbalance of
> > 20% between the 2 groups of 4 cores is just low enough to not trigger
> > the load balance between the 2 groups. We will have always the same 6
> > threads on one group of 4 CPUs and the other 5 threads on the other
> > group of CPUS. With a fair time sharing in each group, we ends up with
> > +20% running time for the group of 5 threads.
> >
>
> AIUI this is the culprit:
>
>                 if (100 * busiest->avg_load <=
>                                 env->sd->imbalance_pct * local->avg_load)
>                         goto out_balanced;
>
> As in your case imbalance_pct=120 becomes the tipping point.
>
> Now, ultimately this would need to scale based on the underlying topology,
> right? If you have a system with 2x32 cores running {33 threads, 34
> threads}, the tipping point becomes imbalance_pct≈103; but then since you
> have this many more cores, it is somewhat questionable.

I wanted to stay conservative and to not trigger too much task
migration because of small imbalance so I decided to decrease the
default threshold to the same level as the MC groups but this can
still generate unfairness. With your example of 2x32 cores, if you end
up with 33 tasks in one group and 38 in the other one, the system is
overloaded so you use load and imbalance_pct but the imbalance will
stay below the new threshold and the 33 tasks will have 13% more
running time.

This new imbalance_pct seems a reasonable step to decrease the unfairness

>
> > Consider decreasing the imbalance threshold for overloaded case where we
> > use the load to balance task and to ensure fair time sharing.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > ---
> >  kernel/sched/topology.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 9079d865a935..1a84b778755d 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -1337,7 +1337,7 @@ sd_init(struct sched_domain_topology_level *tl,
> >               .min_interval           = sd_weight,
> >               .max_interval           = 2*sd_weight,
> >               .busy_factor            = 32,
> > -             .imbalance_pct          = 125,
> > +             .imbalance_pct          = 117,
> >
> >               .cache_nice_tries       = 0,