linux-kernel - Re: [PATCH 0/4] remove runnable_load_avg and improve group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200211210439.GS3420@suse.de>
Date:   Tue, 11 Feb 2020 21:04:39 +0000
From:   Mel Gorman <mgorman@...e.de>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        linux-kernel@...r.kernel.org, pauld@...hat.com,
        parth@...ux.ibm.com, valentin.schneider@....com
Subject: Re: [PATCH 0/4] remove runnable_load_avg and improve group_classify

On Tue, Feb 11, 2020 at 06:46:47PM +0100, Vincent Guittot wrote:
> NUMA load balancing is the last remaining piece of code that uses the 
> runnable_load_avg of PELT to balance tasks between nodes. The normal
> load_balance has replaced it by a better description of the current state
> of the group of cpus.  The same policy can be applied to the numa
> balancing.
> 
> Once unused, runnable_load_avg can be replaced by a simpler runnable_avg
> signal that tracks the waiting time of tasks on rq. Currently, the state
> of a group of CPUs is defined thanks to the number of running task and the
> level of utilization of rq. But the utilization can be temporarly low
> after the migration of a task whereas the rq is still overloaded with
> tasks. In such case where tasks were competing for the rq, the
> runnable_avg will stay high after the migration.
> 
> Some hackbench results:
> 
> - small arm64 dual quad cores system
> hackbench -l (2560/#grp) -g #grp
> 
> grp    tip/sched/core         +patchset              improvement
> 1       1,327(+/-10,06 %)     1,247(+/-5,45 %)       5,97 %
> 4       1,250(+/- 2,55 %)     1,207(+/-2,12 %)       3,42 %
> 8       1,189(+/- 1,47 %)     1,179(+/-1,93 %)       0,90 %
> 16      1,221(+/- 3,25 %)     1,219(+/-2,44 %)       0,16 %						
> 
> - large arm64 2 nodes / 224 cores system
> hackbench -l (256000/#grp) -g #grp
> 
> grp    tip/sched/core         +patchset              improvement
> 1      14,197(+/- 2,73 %)     13,917(+/- 2,19 %)     1,98 %
> 4       6,817(+/- 1,27 %)      6,523(+/-11,96 %)     4,31 %
> 16      2,930(+/- 1,07 %)      2,911(+/- 1,08 %)     0,66 %
> 32      2,735(+/- 1,71 %)      2,725(+/- 1,53 %)     0,37 %
> 64      2,702(+/- 0,32 %)      2,717(+/- 1,07 %)    -0,53 %
> 128     3,533(+/-14,66 %)     3,123(+/-12,47 %)     11,59 %
> 256     3,918(+/-19,93 %)     3,390(+/- 5,93 %)     13,47 %
> 

I haven't reviewed this yet because by co-incidence I'm finalising a
series that tries to reconcile the load balancer with the NUMA balancer
and it has been very tricky to get right.  One aspect though is that
hackbench is generally not long-running enough to detect any performance
regressions in NUMA balancing. At least I've never observed it to be a
good evaluation for NUMA balancing.

> Without the patchset, there is a significant number of time that a CPU has
> spare capacity with more than 1 running task. Although this is a valid
> case, this is not a state that should often happen when 160 tasks are
> competing on 8 cores like for this test. The patchset fixes the situation
> by taking into account the runnable_avg, which stays high after the
> migration of a task on another CPU.
> 

FWIW, during the rewrite, I ended up moving away from runnable_load to
get the load balancer and NUMA balancer to use the same metrics.

-- 
Mel Gorman
SUSE Labs