linux-kernel - Re: [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by delayed NO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170222151802.GB2576@lerouge>
Date:   Wed, 22 Feb 2017 16:18:05 +0100
From:   Frederic Weisbecker <fweisbec@...il.com>
To:     Matt Fleming <matt@...eblueprint.co.uk>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
        Mike Galbraith <umgwanakikbuti@...il.com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        stable@...r.kernel.org,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH v2 1/2] sched/loadavg: Avoid loadavg spikes caused by
 delayed NO_HZ accounting

On Fri, Feb 17, 2017 at 12:07:30PM +0000, Matt Fleming wrote:
> If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
> the pending sample window time on exit, setting the next update not
> one window into the future, but two.
> 
> This situation on exiting NO_HZ is described by:
> 
>   this_rq->calc_load_update < jiffies < calc_load_update
> 
> In this scenario, what we should be doing is:
> 
>   this_rq->calc_load_update = calc_load_update		     [ next window ]
> 
> But what we actually do is:
> 
>   this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]
> 
> This has the effect of delaying load average updates for potentially
> up to ~9seconds.
> 
> This can result in huge spikes in the load average values due to
> per-cpu uninterruptible task counts being out of sync when accumulated
> across all CPUs.
> 
> It's safe to update the per-cpu active count if we wake between sample
> windows because any load that we left in 'calc_load_idle' will have
> been zero'd when the idle load was folded in calc_global_load().
> 
> This issue is easy to reproduce before,
> 
>   commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking")
> 
> just by forking short-lived process pipelines built from ps(1) and
> grep(1) in a loop. I'm unable to reproduce the spikes after that
> commit, but the bug still seems to be present from code review.
> 
> Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Mike Galbraith <umgwanakikbuti@...il.com>
> Cc: Morten Rasmussen <morten.rasmussen@....com>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>

Acked-by: Frederic Weisbecker <fweisbec@...il.com>

Thanks it's much clearer now!