[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170215151210.GA6691@lerouge>
Date: Wed, 15 Feb 2017 16:12:11 +0100
From: Frederic Weisbecker <fweisbec@...il.com>
To: Matt Fleming <matt@...eblueprint.co.uk>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
Mike Galbraith <umgwanakikbuti@...il.com>,
Morten Rasmussen <morten.rasmussen@....com>,
stable@...r.kernel.org,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH] sched/loadavg: Avoid loadavg spikes caused by delayed
NO_HZ accounting
On Wed, Feb 08, 2017 at 01:29:24PM +0000, Matt Fleming wrote:
> The calculation for the next sample window when exiting NOH_HZ idle
> does not handle the fact that we may not have reached the next sample
> window yet
That sentence is hard to parse, it took me some time to figure out that
those two "next sample window" may not refer to the same thing.
Maybe it would be clearer with something along the lines of:
"The calculation for the next sample window when exiting NO_HZ
does not handle the fact that we may not have crossed any sample
window during the NO_HZ period."
> If we wake from NO_HZ idle after the pending this_rq->calc_load_update
> window time when we want idle but before the next sample window
That too was hard to understand. How about:
"If we enter in NO_HZ mode after a pending this_rq->calc_load_update
and we exit from NO_HZ mode before the forthcoming sample window, ..."
> we will add an unnecessary LOAD_FREQ delay to the load average
> accounting, delaying any update for potentially ~9seconds.
>
> This can result in huge spikes in the load average values due to
> per-cpu uninterruptible task counts being out of sync when accumulated
> across all CPUs.
>
> It's safe to update the per-cpu active count if we wake between sample
> windows because any load that we left in 'calc_load_idle' will have
> been zero'd when the idle load was folded in calc_global_load().
>
> This issue is easy to reproduce before,
>
> commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
>
> just by forking short-lived process pipelines built from ps(1) and
> grep(1) in a loop. I'm unable to reproduce the spikes after that
> commit, but the bug still seems to be present from code review.
>
> Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Mike Galbraith <umgwanakikbuti@...il.com>
> Cc: Morten Rasmussen <morten.rasmussen@....com>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>
> Cc: <stable@...r.kernel.org> # v3.5+
> Signed-off-by: Matt Fleming <matt@...eblueprint.co.uk>
I'll comment the change on Peter's proposition.
Thanks!
Powered by blists - more mailing lists