[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtA4h4FQoAEDjVeT4nWJCs5Lk5=9w4VnKq2wgpgJui7Y8w@mail.gmail.com>
Date: Wed, 12 Feb 2020 09:16:53 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Mel Gorman <mgorman@...e.de>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Phil Auld <pauld@...hat.com>, Parth Shah <parth@...ux.ibm.com>,
Valentin Schneider <valentin.schneider@....com>
Subject: Re: [PATCH 0/4] remove runnable_load_avg and improve group_classify
On Tue, 11 Feb 2020 at 22:04, Mel Gorman <mgorman@...e.de> wrote:
>
> On Tue, Feb 11, 2020 at 06:46:47PM +0100, Vincent Guittot wrote:
> > NUMA load balancing is the last remaining piece of code that uses the
> > runnable_load_avg of PELT to balance tasks between nodes. The normal
> > load_balance has replaced it by a better description of the current state
> > of the group of cpus. The same policy can be applied to the numa
> > balancing.
> >
> > Once unused, runnable_load_avg can be replaced by a simpler runnable_avg
> > signal that tracks the waiting time of tasks on rq. Currently, the state
> > of a group of CPUs is defined thanks to the number of running task and the
> > level of utilization of rq. But the utilization can be temporarly low
> > after the migration of a task whereas the rq is still overloaded with
> > tasks. In such case where tasks were competing for the rq, the
> > runnable_avg will stay high after the migration.
> >
> > Some hackbench results:
> >
> > - small arm64 dual quad cores system
> > hackbench -l (2560/#grp) -g #grp
> >
> > grp tip/sched/core +patchset improvement
> > 1 1,327(+/-10,06 %) 1,247(+/-5,45 %) 5,97 %
> > 4 1,250(+/- 2,55 %) 1,207(+/-2,12 %) 3,42 %
> > 8 1,189(+/- 1,47 %) 1,179(+/-1,93 %) 0,90 %
> > 16 1,221(+/- 3,25 %) 1,219(+/-2,44 %) 0,16 %
> >
> > - large arm64 2 nodes / 224 cores system
> > hackbench -l (256000/#grp) -g #grp
> >
> > grp tip/sched/core +patchset improvement
> > 1 14,197(+/- 2,73 %) 13,917(+/- 2,19 %) 1,98 %
> > 4 6,817(+/- 1,27 %) 6,523(+/-11,96 %) 4,31 %
> > 16 2,930(+/- 1,07 %) 2,911(+/- 1,08 %) 0,66 %
> > 32 2,735(+/- 1,71 %) 2,725(+/- 1,53 %) 0,37 %
> > 64 2,702(+/- 0,32 %) 2,717(+/- 1,07 %) -0,53 %
> > 128 3,533(+/-14,66 %) 3,123(+/-12,47 %) 11,59 %
> > 256 3,918(+/-19,93 %) 3,390(+/- 5,93 %) 13,47 %
> >
>
> I haven't reviewed this yet because by co-incidence I'm finalising a
> series that tries to reconcile the load balancer with the NUMA balancer
That's interesting !
This series has been pending for a while and I have finally been able
to send it for review.*
> and it has been very tricky to get right. One aspect though is that
I have been quite conservative in the policy as my main goal was not
to change all numa policy but mainly to remove the last user of
runnable_load_avg and i don't expect much behavior changes
> hackbench is generally not long-running enough to detect any performance
> regressions in NUMA balancing. At least I've never observed it to be a
> good evaluation for NUMA balancing.
>
> > Without the patchset, there is a significant number of time that a CPU has
> > spare capacity with more than 1 running task. Although this is a valid
> > case, this is not a state that should often happen when 160 tasks are
> > competing on 8 cores like for this test. The patchset fixes the situation
> > by taking into account the runnable_avg, which stays high after the
> > migration of a task on another CPU.
> >
>
> FWIW, during the rewrite, I ended up moving away from runnable_load to
> get the load balancer and NUMA balancer to use the same metrics.
>
> --
> Mel Gorman
> SUSE Labs
Powered by blists - more mailing lists