[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <a095f3ef-10d8-b58b-4d84-ac4b06fd91d1@linux.ibm.com>
Date: Sun, 23 Feb 2020 11:38:23 +0530
From: Parth Shah <parth@...ux.ibm.com>
To: Vincent Guittot <vincent.guittot@...aro.org>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, linux-kernel@...r.kernel.org
Cc: pauld@...hat.com, valentin.schneider@....com, hdanton@...a.com
Subject: Re: [PATCH v4 0/5] remove runnable_load_avg and improve
group_classify
Hi,
On 2/21/20 6:57 PM, Vincent Guittot wrote:
> This new version stays quite close to the previous one and should
> replace without problems the previous one that part of Mel's patchset:
> https://lkml.org/lkml/2020/2/14/156
>
> NUMA load balancing is the last remaining piece of code that uses the
> runnable_load_avg of PELT to balance tasks between nodes. The normal
> load_balance has replaced it by a better description of the current state
> of the group of cpus. The same policy can be applied to the numa
> balancing.
>
> Once unused, runnable_load_avg can be replaced by a simpler runnable_avg
> signal that tracks the waiting time of tasks on rq. Currently, the state
> of a group of CPUs is defined thanks to the number of running task and the
> level of utilization of rq. But the utilization can be temporarly low
> after the migration of a task whereas the rq is still overloaded with
> tasks. In such case where tasks were competing for the rq, the
> runnable_avg will stay high after the migration.
>
> Some hackbench results:
>
> - small arm64 dual quad cores system
> hackbench -l (2560/#grp) -g #grp
>
> grp tip/sched/core +patchset improvement
> 1 1,327(+/-10,06 %) 1,247(+/-5,45 %) 5,97 %
> 4 1,250(+/- 2,55 %) 1,207(+/-2,12 %) 3,42 %
> 8 1,189(+/- 1,47 %) 1,179(+/-1,93 %) 0,90 %
> 16 1,221(+/- 3,25 %) 1,219(+/-2,44 %) 0,16 %
>
> - large arm64 2 nodes / 224 cores system
> hackbench -l (256000/#grp) -g #grp
>
> grp tip/sched/core +patchset improvement
> 1 14,197(+/- 2,73 %) 13,917(+/- 2,19 %) 1,98 %
> 4 6,817(+/- 1,27 %) 6,523(+/-11,96 %) 4,31 %
> 16 2,930(+/- 1,07 %) 2,911(+/- 1,08 %) 0,66 %
> 32 2,735(+/- 1,71 %) 2,725(+/- 1,53 %) 0,37 %
> 64 2,702(+/- 0,32 %) 2,717(+/- 1,07 %) -0,53 %
> 128 3,533(+/-14,66 %) 3,123(+/-12,47 %) 11,59 %
> 256 3,918(+/-19,93 %) 3,390(+/- 5,93 %) 13,47 %
[...]
I performed similar experiment on IBM POWER9 system with 2 nodes 44 cores
system (22 per node)
- hackbench -l (256000/#grp) -g #grp
+-----+----------------+-------+
| grp | tip/sched/core | v4 |
+-----+----------------+-------+
| 1 | 76.97 | 76.31 |
| 4 | 56.56 | 56.86 |
| 8 | 54.23 | 54.25 |
| 16 | 53.94 | 53.24 |
| 32 | 54.10 | 54.01 |
| 64 | 54.38 | 54.35 |
| 128 | 55.11 | 55.08 |
| 256 | 55.97 | 56.04 |
| 512 | 54.81 | 55.5 |
+-----+----------------+-------+
- deviation in the result is very marginal ( < 1%)
The results shows no changes with respect to the hackbench. I will do
further benchmarking to see if any observable changes occurs.
- Parth
Powered by blists - more mailing lists