[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200215215823.GY3420@suse.de>
Date: Sat, 15 Feb 2020 21:58:24 +0000
From: Mel Gorman <mgorman@...e.de>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
linux-kernel@...r.kernel.org, pauld@...hat.com,
parth@...ux.ibm.com, valentin.schneider@....com, hdanton@...a.com
Subject: Re: [PATCH v2 0/5] remove runnable_load_avg and improve
group_classify
On Fri, Feb 14, 2020 at 04:27:24PM +0100, Vincent Guittot wrote:
> This new version stays quite close to the previous one and should
> replace without problems the previous one that part of Mel's patchset:
> https://lkml.org/lkml/2020/2/14/156
>
As far as I can see, the differences are harmless and look sane. I do think
that an additional fix is mandatory as I see no reason why the regression
was fixed. As such, I'll release a v3 of the series that includes your
new patches with the minimal fix inserted where appropriate. I'll have
tests running over the rest of the weekend.
> Some hackbench results:
>
> - small arm64 dual quad cores system
> hackbench -l (2560/#grp) -g #grp
>
> grp tip/sched/core +patchset improvement
> 1 1,327(+/-10,06 %) 1,247(+/-5,45 %) 5,97 %
> 4 1,250(+/- 2,55 %) 1,207(+/-2,12 %) 3,42 %
> 8 1,189(+/- 1,47 %) 1,179(+/-1,93 %) 0,90 %
> 16 1,221(+/- 3,25 %) 1,219(+/-2,44 %) 0,16 %
>
> - large arm64 2 nodes / 224 cores system
> hackbench -l (256000/#grp) -g #grp
>
> grp tip/sched/core +patchset improvement
> 1 14,197(+/- 2,73 %) 13,917(+/- 2,19 %) 1,98 %
> 4 6,817(+/- 1,27 %) 6,523(+/-11,96 %) 4,31 %
> 16 2,930(+/- 1,07 %) 2,911(+/- 1,08 %) 0,66 %
> 32 2,735(+/- 1,71 %) 2,725(+/- 1,53 %) 0,37 %
> 64 2,702(+/- 0,32 %) 2,717(+/- 1,07 %) -0,53 %
> 128 3,533(+/-14,66 %) 3,123(+/-12,47 %) 11,59 %
> 256 3,918(+/-19,93 %) 3,390(+/- 5,93 %) 13,47 %
>
> The significant improvement for 128 and 256 should be taken with care
> because of some instabilities over iterations without the patchset.
>
For the most part I do not see similar results to this with hackbench with
one exception -- EPYC first generation. I don't have results for EPYC 2
yet but I'm curious if the machine you have has multiple L3 caches per
NUMA domain? Various Intel CPU generations show improvements but they're
not as dramatic. Tests will tell me for sure but I have some confidence
that it'll look like
Small tracing patches -- no difference
Vincent Patches 1-2 -- regressions
Fix from Mel -- small overall improvement on baseline
Vincent patches 3-5 -- small improvements mostly, sometimes big ones
on hackbench depending on the machine
Rest of Mel series -- generally ok across machines and CPU generations
Even if the improvements are not dramatic, I think it'll be worth it to
have NUMA and CPU balancer using similarly sane logic and overall I find
the load balancer easier to understand with the new logic so yey!
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists