[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1271200751-18697-1-git-send-email-chase.douglas@canonical.com>
Date: Tue, 13 Apr 2010 16:19:11 -0700
From: Chase Douglas <chase.douglas@...onical.com>
To: linux-kernel@...r.kernel.org
Cc: Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
Peter Zijlstra <peterz@...radead.org>,
"Rafael J. Wysocki" <rjw@...k.pl>,
kernel-team <kernel-team@...ts.ubuntu.com>
Subject: [REGRESSION 2.6.30][PATCH v3] sched: update load count only once per cpu in 10 tick update window
There's a period of 10 ticks where calc_load_tasks is updated by all the
cpus for the load avg. Usually all the cpus do this during the first
tick. If any cpus go idle, calc_load_tasks is decremented accordingly.
However, if they wake up calc_load_tasks is not incremented. Thus, if
cpus go idle during the 10 tick period, calc_load_tasks may be
decremented to a non-representative value. This issue can lead to
systems having a load avg of exactly 0, even though the real load avg
could theoretically be up to NR_CPUS.
This change defers calc_load_tasks accounting after each cpu updates the
count until after the 10 tick update window.
A few points:
* A global atomic deferral counter, and not per-cpu vars, is needed
because a cpu may go NOHZ idle and not be able to update the global
calc_load_tasks variable for subsequent load calculations.
* It is not enough to add calls to account for the load when a cpu is
awakened:
- Load avg calculation must be independent of cpu load.
- If a cpu is awakend by one tasks, but then has more scheduled before
the end of the update window, only the first task will be accounted.
BugLink: http://bugs.launchpad.net/bugs/513848
Signed-off-by: Chase Douglas <chase.douglas@...onical.com>
Acked-by: Colin King <colin.king@...onical.com>
Acked-by: Andy Whitcroft <apw@...onical.com>
---
kernel/sched.c | 24 ++++++++++++++++++++++--
1 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index abb36b1..be348cd 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3010,6 +3010,7 @@ unsigned long this_cpu_load(void)
/* Variables and functions for calc_load */
static atomic_long_t calc_load_tasks;
+static atomic_long_t calc_load_tasks_deferred;
static unsigned long calc_load_update;
unsigned long avenrun[3];
EXPORT_SYMBOL(avenrun);
@@ -3064,7 +3065,7 @@ void calc_global_load(void)
*/
static void calc_load_account_active(struct rq *this_rq)
{
- long nr_active, delta;
+ long nr_active, delta, deferred;
nr_active = this_rq->nr_running;
nr_active += (long) this_rq->nr_uninterruptible;
@@ -3072,6 +3073,25 @@ static void calc_load_account_active(struct rq *this_rq)
if (nr_active != this_rq->calc_load_active) {
delta = nr_active - this_rq->calc_load_active;
this_rq->calc_load_active = nr_active;
+
+ /*
+ * Update calc_load_tasks only once per cpu in 10 tick update
+ * window.
+ */
+ if (unlikely(time_before(jiffies, this_rq->calc_load_update) &&
+ time_after_eq(jiffies, calc_load_update))) {
+ if (delta)
+ atomic_long_add(delta,
+ &calc_load_tasks_deferred);
+ return;
+ }
+
+ if (atomic_long_read(&calc_load_tasks_deferred)) {
+ deferred = atomic_long_xchg(&calc_load_tasks_deferred,
+ 0);
+ delta += deferred;
+ }
+
atomic_long_add(delta, &calc_load_tasks);
}
}
@@ -3106,8 +3126,8 @@ static void update_cpu_load(struct rq *this_rq)
}
if (time_after_eq(jiffies, this_rq->calc_load_update)) {
- this_rq->calc_load_update += LOAD_FREQ;
calc_load_account_active(this_rq);
+ this_rq->calc_load_update += LOAD_FREQ;
}
}
--
1.6.3.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists