[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b4ccaf05-36f7-fdde-dbb4-4ee98d2ebd7a@arm.com>
Date: Mon, 29 Jan 2018 19:43:44 +0100
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>,
Peter Zijlstra <peterz@...radead.org>,
Morten Rasmussen <morten.rasmussen@...s.arm.com>
Cc: Ingo Molnar <mingo@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Brendan Jackman <brendan.jackman@....com>,
Morten Rasmussen <morten.rasmussen@....com>
Subject: Re: [RFC PATCH 2/5] sched: Add NOHZ_STATS_KICK
On 01/24/2018 09:25 AM, Vincent Guittot wrote:
> Hi,
>
> Le Thursday 18 Jan 2018 à 10:38:07 (+0000), Morten Rasmussen a écrit :
>> On Mon, Jan 15, 2018 at 09:26:09AM +0100, Vincent Guittot wrote:
>>> Le Wednesday 03 Jan 2018 à 10:16:00 (+0100), Vincent Guittot a écrit :
[...]
>>>
>>> Hi Peter,
>>>
>>> With the patch below on top of your branch, the blocked loads are updated and
>>> decayed regularly. The main differences are:
>>> - It doesn't use a timer to trig ilb but the tick and when a cpu becomes idle.
>>> The main drawback of this solution is that the load is blocked when the
>>> system is fully idle with the advantage of not waking up a fully idle
>>> system. We have to wait for the next tick or newly idle event for updating
>>> blocked load when the system leaves idle stat which can be up to a tick long.
>>> If this is too long, we can check for kicking ilb when task wakes up so the
>>> blocked load will be updated as soon as the system leaves idle state.
>>> The main advantage is that we don't wake up a fully idle system every 32ms to
>>> update blocked load that will be not used.
>>> - I'm working on one more improvement to use nohz_idle_balance in the newly
>>> idle case when the system is not overloaded and
>>> (this_rq->avg_idle > sysctl_sched_migration_cost). In this case, we can try to
>>> use nohz_idle_balance with NOHZ_STATS_KICK and abort as soon as it exceed
>>> this_rq->avg_idle. This will remove some calls to kick_ilb and some wake up
>>> of an idle cpus.
>>
>> This sound like what I meant in my other reply :-)
>>
>> It seems pointless to have a timer to update PELT if the system is
>> completely idle, and when it isn't we can piggy back other events to
>> make the updates happen.
>
> The patch below implements what has been described above. It calls part of
> nohz_idle_balance when a cpu becomes idle and kick a ilb if it takes too much
> time. This removes part of ilb that are kicked on an idle cpu for updating
> the blocked load but the ratio really depends on when the tick happens compared
> to a cpu becoming idle and the 32ms boundary. I have an additionnal patch that
> enables to update the blocked loads when a cpu becomes idle 1 period before
> kicking an ilb and there is far less ilb because we give more chance to the
> newly idle case (time_after is replaced by time_after_eq in idle_balance()).
>
> The patch also uses a function cfs_rq_has_blocked, which only checks the
> util/load_avg, instead of the cfs_rq_is_decayed which check *_sum too. This
> reduce significantly the number of update of blocked load. the *_avg will be
> fully decayed in around 300~400ms but it's far longer for the *_sum which have
> a higher resolution and we can easily reach almost seconds. But only the *_avg
> are used to make decision so keeping some blocked *_sum is acceptable.
>
> ---
> kernel/sched/fair.c | 121 +++++++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 92 insertions(+), 29 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 898785d..ed90303 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7356,6 +7356,17 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
> return true;
> }
>
> +static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq)
> +{
> + if (cfs_rq->avg.load_avg)
> + return true;
> +
> + if (cfs_rq->avg.util_avg)
> + return true;
> +
> + return false;
> +}
> +
Can we not change cfs_rq_is_decayed() to use avg.foo_avg instead of
avg.foo_sum ?
> #ifdef CONFIG_FAIR_GROUP_SCHED
>
> static void update_blocked_averages(int cpu)
> @@ -7393,7 +7404,9 @@ static void update_blocked_averages(int cpu)
> */
> if (cfs_rq_is_decayed(cfs_rq))
> list_del_leaf_cfs_rq(cfs_rq);
> - else
> +
> + /* Don't need periodic decay once load/util_avg are null */
> + if (cfs_rq_has_blocked(cfs_rq))
> done = false;
> }
>
> @@ -7463,7 +7476,7 @@ static inline void update_blocked_averages(int cpu)
> update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
> #ifdef CONFIG_NO_HZ_COMMON
> rq->last_blocked_load_update_tick = jiffies;
> - if (cfs_rq_is_decayed(cfs_rq))
> + if (cfs_rq_has_blocked(cfs_rq))
Schouldn't this be !cfs_rq_has_blocked(cfs_rq) ?
> rq->has_blocked_load = 0;
> #endif
> rq_unlock_irqrestore(rq, &rf);
[...]
> @@ -9438,7 +9451,17 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> */
> if (need_resched()) {
> has_blocked_load = true;
> - break;
> + goto abort;
> + }
> +
> + /*
> + * If the update is done while CPU becomes idle, we abort
> + * the update when its cost is higher than the average idle
> + * time in orde to not delay a possible wake up.
> + */
> + if (idle == CPU_NEWLY_IDLE && this_rq->avg_idle < curr_cost) {
> + has_blocked_load = true;
> + goto abort;
> }
>
> rq = cpu_rq(balance_cpu);
> @@ -9453,10 +9476,10 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> if (time_after_eq(jiffies, rq->next_balance)) {
> struct rq_flags rf;
>
> - rq_lock_irq(rq, &rf);
> + rq_lock_irqsave(rq, &rf);
> update_rq_clock(rq);
> cpu_load_update_idle(rq);
> - rq_unlock_irq(rq, &rf);
> + rq_unlock_irqrestore(rq, &rf);
>
> if (flags & NOHZ_BALANCE_KICK)
> rebalance_domains(rq, CPU_IDLE);
> @@ -9466,10 +9489,17 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
> next_balance = rq->next_balance;
> update_next_balance = 1;
> }
Why do you do this cpu_load_update_idle(rq) even this was called with
CPU_NEWLY_IDLE? Wouldn't it be sufficient to jump to the curr_cost
calculation in this case?
> +
> + domain_cost = sched_clock_cpu(this_cpu) - t0;
> + curr_cost += domain_cost;
> +
> }
>
> - update_blocked_averages(this_cpu);
> - has_blocked_load |= this_rq->has_blocked_load;
> + /* Newly idle CPU doesn't need an update */
> + if (idle != CPU_NEWLY_IDLE) {
> + update_blocked_averages(this_cpu);
> + has_blocked_load |= this_rq->has_blocked_load;
> + }
>
> if (flags & NOHZ_BALANCE_KICK)
> rebalance_domains(this_rq, CPU_IDLE);
[...]
Powered by blists - more mailing lists