[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e60ab36-aecd-8912-0cda-72dba21268d2@arm.com>
Date: Fri, 29 Oct 2021 12:01:06 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, rostedt@...dmis.org,
bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
linux-kernel@...r.kernel.org, tim.c.chen@...ux.intel.com
Subject: Re: [PATCH v3 0/5] Improve newidle lb cost tracking and early abort
On 19/10/2021 14:35, Vincent Guittot wrote:
> This patchset updates newidle lb cost tracking and early abort:
>
> The time spent running update_blocked_averages is now accounted in the 1st
> sched_domain level. This time can be significant and move the cost of
> newidle lb above the avg_idle time.
>
> The decay of max_newidle_lb_cost is modified to start only when the field
> has not been updated for a while. Recent update will not be decayed
> immediatlybut only after a while.
>
> The condition of an avg_idle lower than sysctl_sched_migration_cost has
> been removed as the 500us value is quite large and prevent opportunity to
> pull task on the newly idle CPU for at least 1st domain levels.
>
> Monitoring sd->max_newidle_lb_cost on cpu0 of a Arm64 system
> THX2 (2 nodes * 28 cores * 4 cpus) during the benchmarks gives the
> following results:
> min avg max
> SMT: 1us 33us 273us - this one includes the update of blocked load
> MC: 7us 49us 398us
> NUMA: 10us 45us 158us
>
>
> Some results for hackbench -l $LOOPS -g $group :
> group tip/sched/core + this patchset
> 1 15.189(+/- 2%) 14.987(+/- 2%) +1%
> 4 4.336(+/- 3%) 4.322(+/- 5%) +0%
> 16 3.654(+/- 1%) 2.922(+/- 3%) +20%
> 32 3.209(+/- 1%) 2.919(+/- 3%) +9%
> 64 2.965(+/- 1%) 2.826(+/- 1%) +4%
> 128 2.954(+/- 1%) 2.993(+/- 8%) -1%
> 256 2.951(+/- 1%) 2.894(+/- 1%) +2%
>
> tbench and reaim have not shown any difference
>
> Change since v2:
> - Update and decay of sd->last_decay_max_lb_cost are gathered in
> update_newidle_cost(). The behavior remains almost the same except that
> the decay can happen during newidle_balance now.
>
> Tests results haven't shown any differences
>
> I haven't modified rq->max_idle_balance_cost. It acts as the max value
> for avg_idle and prevents the latter to reach high value during long
> idle phase. Moving on an IIR filter instead, could delay the convergence
> of avg_idle to a reasonnable value that reflect current situation.
>
> - Added a minor cleanup of newidle_balance
>
> Change since v1:
> - account the time spent in update_blocked_averages() in the 1st domain
>
> - reduce number of call of sched_clock_cpu()
>
> - change the way max_newidle_lb_cost is decayed. Peter suggested to use a
> IIR but keeping a track of the current max value gave the best result
>
> - removed the condition (this_rq->avg_idle < sysctl_sched_migration_cost)
> as suggested by Peter
>
> Vincent Guittot (5):
> sched/fair: Account update_blocked_averages in newidle_balance cost
> sched/fair: Skip update_blocked_averages if we are defering load
> balance
> sched/fair: Wait before decaying max_newidle_lb_cost
> sched/fair: Remove sysctl_sched_migration_cost condition
> sched/fair: cleanup newidle_balance
>
> include/linux/sched/topology.h | 2 +-
> kernel/sched/fair.c | 65 ++++++++++++++++++++++------------
> kernel/sched/topology.c | 2 +-
> 3 files changed, 45 insertions(+), 24 deletions(-)
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@....com>
LGTM, just a couple of questions in 3/5 and 4/5.
Powered by blists - more mailing lists