lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <1270153421-9199-1-git-send-email-chase.douglas@canonical.com>
Date:	Thu,  1 Apr 2010 16:23:41 -0400
From:	Chase Douglas <chase.douglas@...onical.com>
To:	linux-kernel@...r.kernel.org
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	kernel-team <kernel-team@...ts.ubuntu.com>
Subject: [REGRESSION 2.6.30][PATCH v2] sched: update load count only once per cpu in 10 tick update window

A task that often runs for less than 10 ticks at a time is likely to be
left out of the load avg calculation. It is possible to craft a task
that is runnable 90% of the time, but sleeps at least once every 10
ticks. When run on an otherwise idle system, the load avg will remain
near 0.00, even though the cpu usage is 90%.

There's a period of 10 ticks where calc_load_tasks is updated by all the
cpus for the load avg. Usually all the cpus do this during the first
tick. If any cpus go idle, calc_load_tasks is decremented accordingly.
However, if they wake up calc_load_tasks is not incremented. Thus, if
cpus go idle during the 10 tick period, calc_load_tasks may be
decremented to a non-representative value. This issue can lead to
systems having a load avg of exactly 0, even though the real load avg
could theoretically be up to NR_CPUS.

This is a regression since 2.6.30. The offending commit is:
dce48a84adf1806676319f6f480e30a6daa012f9.

This change defers calc_load_tasks accounting after each cpu updates the
count until after the 10 tick update window.

A few points:

* A global atomic deferral counter, and not per-cpu vars, is needed
  because a cpu may go NOHZ idle and not be able to update the global
  calc_load_tasks variable for subsequent load calculations.
* It is not enough to add calls to account for the load when a cpu is
  awakened:
  - Load avg calculation must be independent of cpu load.
  - If a cpu is awakend by one tasks, but then has more scheduled before
    the end of the update window, only the first task will be accounted.

BugLink: http://bugs.launchpad.net/bugs/513848

Signed-off-by: Chase Douglas <chase.douglas@...onical.com>
---
 kernel/sched.c |   24 ++++++++++++++++++++++--
 1 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 81ede13..bc5233f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2967,6 +2967,7 @@ unsigned long nr_iowait(void)
 
 /* Variables and functions for calc_load */
 static atomic_long_t calc_load_tasks;
+static atomic_long_t calc_load_tasks_deferred;
 static unsigned long calc_load_update;
 unsigned long avenrun[3];
 EXPORT_SYMBOL(avenrun);
@@ -3021,7 +3022,7 @@ void calc_global_load(void)
  */
 static void calc_load_account_active(struct rq *this_rq)
 {
-	long nr_active, delta;
+	long nr_active, delta, deferred;
 
 	nr_active = this_rq->nr_running;
 	nr_active += (long) this_rq->nr_uninterruptible;
@@ -3029,6 +3030,25 @@ static void calc_load_account_active(struct rq *this_rq)
 	if (nr_active != this_rq->calc_load_active) {
 		delta = nr_active - this_rq->calc_load_active;
 		this_rq->calc_load_active = nr_active;
+
+		/*
+		 * Update calc_load_tasks only once per cpu in 10 tick update
+		 * window.
+		 */
+		if (unlikely(time_before(jiffies, this_rq->calc_load_update) &&
+			     time_after_eq(jiffies, calc_load_update))) {
+			if (delta)
+				atomic_long_add(delta,
+						&calc_load_tasks_deferred);
+			return;
+		}
+
+		if (calc_load_tasks_deferred.counter) {
+			deferred = atomic_long_xchg(&calc_load_tasks_deferred,
+						    0);
+			delta += deferred;
+		}
+
 		atomic_long_add(delta, &calc_load_tasks);
 	}
 }
@@ -3072,8 +3092,8 @@ static void update_cpu_load(struct rq *this_rq)
 	}
 
 	if (time_after_eq(jiffies, this_rq->calc_load_update)) {
-		this_rq->calc_load_update += LOAD_FREQ;
 		calc_load_account_active(this_rq);
+		this_rq->calc_load_update += LOAD_FREQ;
 	}
 }
 
-- 
1.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ