linux-kernel - Re: [PATCH v2 0/5] remove runnable_load_avg and improve group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200215215823.GY3420@suse.de>
Date:   Sat, 15 Feb 2020 21:58:24 +0000
From:   Mel Gorman <mgorman@...e.de>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        linux-kernel@...r.kernel.org, pauld@...hat.com,
        parth@...ux.ibm.com, valentin.schneider@....com, hdanton@...a.com
Subject: Re: [PATCH v2 0/5] remove runnable_load_avg and improve
 group_classify

On Fri, Feb 14, 2020 at 04:27:24PM +0100, Vincent Guittot wrote:
> This new version stays quite close to the previous one and should 
> replace without problems the previous one that part of Mel's patchset:
> https://lkml.org/lkml/2020/2/14/156
> 

As far as I can see, the differences are harmless and look sane. I do think
that an additional fix is mandatory as I see no reason why the regression
was fixed. As such, I'll release a v3 of the series that includes your
new patches with the minimal fix inserted where appropriate. I'll have
tests running over the rest of the weekend.

> Some hackbench results:
> 
> - small arm64 dual quad cores system
> hackbench -l (2560/#grp) -g #grp
> 
> grp    tip/sched/core         +patchset              improvement
> 1       1,327(+/-10,06 %)     1,247(+/-5,45 %)       5,97 %
> 4       1,250(+/- 2,55 %)     1,207(+/-2,12 %)       3,42 %
> 8       1,189(+/- 1,47 %)     1,179(+/-1,93 %)       0,90 %
> 16      1,221(+/- 3,25 %)     1,219(+/-2,44 %)       0,16 %						
> 
> - large arm64 2 nodes / 224 cores system
> hackbench -l (256000/#grp) -g #grp
> 
> grp    tip/sched/core         +patchset              improvement
> 1      14,197(+/- 2,73 %)     13,917(+/- 2,19 %)     1,98 %
> 4       6,817(+/- 1,27 %)      6,523(+/-11,96 %)     4,31 %
> 16      2,930(+/- 1,07 %)      2,911(+/- 1,08 %)     0,66 %
> 32      2,735(+/- 1,71 %)      2,725(+/- 1,53 %)     0,37 %
> 64      2,702(+/- 0,32 %)      2,717(+/- 1,07 %)    -0,53 %
> 128     3,533(+/-14,66 %)     3,123(+/-12,47 %)     11,59 %
> 256     3,918(+/-19,93 %)     3,390(+/- 5,93 %)     13,47 %
> 
> The significant improvement for 128 and 256 should be taken with care
> because of some instabilities over iterations without the patchset.
> 

For the most part I do not see similar results to this with hackbench with
one exception -- EPYC first generation. I don't have results for EPYC 2
yet but I'm curious if the machine you have has multiple L3 caches per
NUMA domain? Various Intel CPU generations show improvements but they're
not as dramatic.  Tests will tell me for sure but I have some confidence
that it'll look like

Small tracing patches -- no difference
Vincent Patches 1-2   -- regressions
Fix from Mel          -- small overall improvement on baseline
Vincent patches 3-5   -- small improvements mostly, sometimes big ones
                         on hackbench depending on the machine
Rest of Mel series    -- generally ok across machines and CPU generations

Even if the improvements are not dramatic, I think it'll be worth it to
have NUMA and CPU balancer using similarly sane logic and overall I find
the load balancer easier to understand with the new logic so yey!

-- 
Mel Gorman
SUSE Labs