[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <002901d1547a$f5309480$df91bd80$@net>
Date: Thu, 21 Jan 2016 10:38:43 -0800
From: "Doug Smythies" <dsmythies@...us.net>
To: "'Peter Zijlstra'" <peterz@...radead.org>,
"'Vik Heyndrickx'" <vik.heyndrickx@...ibox.net>
Cc: <linux-kernel@...r.kernel.org>,
"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [PATCH] sched: loadavg 0.00, 0.01, 0.05 on idle
On 2016.01.21 07:29 Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 10:23:25AM +0100, Vik Heyndrickx wrote:
>> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they have
>> no load at all.
>> ---
>> Subject: sched: Fix non-zero idle loadavg
>> From: Vik Heyndrickx <vik.heyndrickx@...ibox.net>
>> Date: Thu, 21 Jan 2016 10:23:25 +0100
>> Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
>> have no load at all.
>> By removing the single code line that performed a rounding on the
>> internally kept load value, effectively returning this function
>> calc_load to its state it had before, the visualization problem is
>> completely fixed.
Yes, but it introduces a systematic error, rather than the current
balanced error. Thus it doubles the maximum error due to finite number
of bits used in the math.
>> Once the (old) load becomes 93 or higher, it mathematically can never
>> get lower than 93, even when the active (load) remains 0 forever.
>> This results in the strange 0.00, 0.01, 0.05 uptime values on idle
>> systems. Note: 93/2048 = 0.0454..., which rounds up to 0.05.
As I mentioned on the bug report [1], this is a consequence
of carrying a finite number of bits with a so very strong
IIR (Infinite Impulse Response) filter coefficient.
>> It is not correct to add a 0.5 rounding (=1024/2048) here, since the
>> result from this function is fed back into the next iteration again,
>> so the result of that +0.5 rounding value then gets multiplied by
>> (2048-2037), and then rounded again, so there is a virtual "ghost"
>> load created, next to the old and active load terms.
If you do not round then you get a doubling of problems on the load
increasing side of things. Consider an old load value of 1862 (90.92%),
regardless of how it got there, and a new load value of 2048 (100%)
from here onwards. With this proposed change, the 15 minute math becomes:
new = (old * 2037 + load * (2048 - 2037)) / 2048
new = (1862 * 2037 + 2048 * (2048 - 2037)) / 2048
new = 1862
So, the 100% load will always be shown as 91% (double the old limit).
I have been running this proposed code with 100% load on CPU 7 for a couple
of hours now, and the 15 minute load average is stuck at 0.91.
Myself, I would not take out the rounding, but I defer to Peter.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=45001
Powered by blists - more mailing lists