linux-kernel - RE: [ 56/75] sched: Fix nohz load accounting -- again!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <001601cd2a41$c83798a0$58a6c9e0$@net>
Date:	Fri, 4 May 2012 15:03:00 -0700
From:	"Doug Smythies" <dsmythies@...us.net>
To:	"'Greg KH'" <gregkh@...uxfoundation.org>,
	<linux-kernel@...r.kernel.org>, <stable@...r.kernel.org>
Cc:	<torvalds@...ux-foundation.org>, <akpm@...ux-foundation.org>,
	<alan@...rguk.ukuu.org.uk>,
	'LesÅ�aw Kopeć' 
	<leslaw.kopec@...za-klasa.pl>, "'Aman Gupta'" <aman@...1.net>,
	"'Peter Zijlstra'" <a.p.zijlstra@...llo.nl>,
	"'Ingo Molnar'" <mingo@...e.hu>,
	"'Kerin Millar'" <kerframil@...il.com>,
	"Doug Smythies" <dsmythies@...us.net>
Subject: RE: [ 56/75] sched: Fix nohz load accounting -- again!

Just earlier this morning, an issue was raised against this patch on the related thread on Ubuntu launchpad.
The complaint is that reported load averages are now to high under conditions of "high" frequency enter into / exit from cpu idle conditions where the cpu is very lightly loaded.
Most of the testing I did was with medium to heavy load on the cpu and relatively short idle periods. This is the opposite.
For a quick test, I hacked up my test program, and was able to reproduce the issue.
I am still attempting to understand better and also determine the lower bound to "high" frequency (I think it is 25 Hertz, and scales from there down, but no proof yet). I'll also go backwards and test this scenario without the patch. I'll let this list know the results, but it might be a few days.
My quick test results are attached.

Doug Smythies

References (even though I have been told not to include links in e-mails to this list):

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/838811 <<< starting from post #52

version:
doug@s15:~/c$ uname -a
Linux s15 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
doug@s15:~/c$ cat /proc/version
Linux version 3.2.0-24-generic (buildd@...low) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC 2012

-----Original Message-----
From: Greg KH [mailto:gregkh@...uxfoundation.org] 
Sent: May-04-2012 13:43
To: linux-kernel@...r.kernel.org; stable@...r.kernel.org
Cc: torvalds@...ux-foundation.org; akpm@...ux-foundation.org; alan@...rguk.ukuu.org.uk; Doug Smythies; LesÅ�aw Kopeć; Aman Gupta; Peter Zijlstra; Ingo Molnar; Kerin Millar
Subject: [ 56/75] sched: Fix nohz load accounting -- again!

3.3-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <peterz@...radead.org>

commit c308b56b5398779cd3da0f62ab26b0453494c3d4 upstream.

Various people reported nohz load tracking still being wrecked, but Doug spotted the actual problem. We fold the nohz remainder in too soon, causing us to loose samples and under-account.

So instead of playing catch-up up-front, always do a single load-fold with whatever state we encounter and only then fold the nohz remainder and play catch-up.

Reported-by: Doug Smythies <dsmythies@...us.net>
Reported-by: LesÅ=82aw Kope=C4=87 <leslaw.kopec@...za-klasa.pl>
Reported-by: Aman Gupta <aman@...1.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Link: http://lkml.kernel.org/n/tip-4v31etnhgg9kwd6ocgx3rxl8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@...e.hu>
Cc: Kerin Millar <kerframil@...il.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>

---
 kernel/sched/core.c |   53 +++++++++++++++++++++++++---------------------------
 1 file changed, 26 insertions(+), 27 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2266,13 +2266,10 @@ calc_load_n(unsigned long load, unsigned
  * Once we've updated the global active value, we need to apply the exponential
  * weights adjusted to the number of cycles missed.
  */
-static void calc_global_nohz(unsigned long ticks)
+static void calc_global_nohz(void)
 {
 	long delta, active, n;
 
-	if (time_before(jiffies, calc_load_update))
-		return;
-
 	/*
 	 * If we crossed a calc_load_update boundary, make sure to fold
 	 * any pending idle changes, the respective CPUs might have @@ -2284,31 +2281,25 @@ static void calc_global_nohz(unsigned lo
 		atomic_long_add(delta, &calc_load_tasks);
 
 	/*
-	 * If we were idle for multiple load cycles, apply them.
+	 * It could be the one fold was all it took, we done!
 	 */
-	if (ticks >= LOAD_FREQ) {
-		n = ticks / LOAD_FREQ;
+	if (time_before(jiffies, calc_load_update + 10))
+		return;
 
-		active = atomic_long_read(&calc_load_tasks);
-		active = active > 0 ? active * FIXED_1 : 0;
+	/*
+	 * Catch-up, fold however many we are behind still
+	 */
+	delta = jiffies - calc_load_update - 10;
+	n = 1 + (delta / LOAD_FREQ);
 
-		avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
-		avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
-		avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
+	active = atomic_long_read(&calc_load_tasks);
+	active = active > 0 ? active * FIXED_1 : 0;
 
-		calc_load_update += n * LOAD_FREQ;
-	}
+	avenrun[0] = calc_load_n(avenrun[0], EXP_1, active, n);
+	avenrun[1] = calc_load_n(avenrun[1], EXP_5, active, n);
+	avenrun[2] = calc_load_n(avenrun[2], EXP_15, active, n);
 
-	/*
-	 * Its possible the remainder of the above division also crosses
-	 * a LOAD_FREQ period, the regular check in calc_global_load()
-	 * which comes after this will take care of that.
-	 *
-	 * Consider us being 11 ticks before a cycle completion, and us
-	 * sleeping for 4*LOAD_FREQ + 22 ticks, then the above code will
-	 * age us 4 cycles, and the test in calc_global_load() will
-	 * pick up the final one.
-	 */
+	calc_load_update += n * LOAD_FREQ;
 }
 #else
 void calc_load_account_idle(struct rq *this_rq) @@ -2320,7 +2311,7 @@ static inline long calc_load_fold_idle(v
 	return 0;
 }
 
-static void calc_global_nohz(unsigned long ticks)
+static void calc_global_nohz(void)
 {
 }
 #endif
@@ -2348,8 +2339,6 @@ void calc_global_load(unsigned long tick  {
 	long active;
 
-	calc_global_nohz(ticks);
-
 	if (time_before(jiffies, calc_load_update + 10))
 		return;
 
@@ -2361,6 +2350,16 @@ void calc_global_load(unsigned long tick
 	avenrun[2] = calc_load(avenrun[2], EXP_15, active);
 
 	calc_load_update += LOAD_FREQ;
+
+	/*
+	 * Account one period with whatever state we found before
+	 * folding in the nohz state and ageing the entire idle period.
+	 *
+	 * This avoids loosing a sample when we go idle between
+	 * calc_load_account_active() (10 ticks ago) and now and thus
+	 * under-accounting.
+	 */
+	calc_global_nohz();
 }
 
 /*



View attachment "high01.txt" of type "text/plain" (4668 bytes)