[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080430171521.5454.91644.stgit@ghaskins-t60p.haskins.net>
Date: Wed, 30 Apr 2008 13:15:57 -0400
From: Gregory Haskins <ghaskins@...ell.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Peter Zijlstra <peterz@...radead.org>,
David Bahi <dbahi@...ell.com>,
Gregory Haskins <ghaskins@...ell.com>,
linux-kernel@...r.kernel.org
Subject: [PATCH] sched: fix inv_weight calc
We currently have a bug in sched-devel where the system will fail to
balance tasks if CONFIG_FAIR_GROUP_SCHED=n. To reproduce, simply launch
a workload with multiple tasks and observe (either via top or
/proc/sched_debug) that the tasks do not distribute much (if at all)
around to all available cores. Instead, they tend to clump on one processor
while the other cores are idle.
Bisecting, we found the culprit to be:
commit 1b9552e878a5db3388eba8660e8d8400020a07e9
Author: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Date: Tue Apr 29 13:47:36 2008 +0200
Subject: sched: higher granularity load on 64bit systems
Once we identified this patch as the problem, I studied what possible
effect it could have with FAIR_GROUP_SCHED=n vs y. Most of the code in
1b9552e8 would be compiled out if we disable group-scheduling, but there
is one particular logic change in calc_delta_mine() that affects both modes
that looked suspicious. It changes the computation of the inverse-weight
from:
inv_weight = (WMULT_CONST-weight/2)/(weight+1)
to
inv_weight = 1+(WMULT_CONST-weight/2)/(weight+1)
This patch restores the algorithm to its original logic, and seems to solve
the regression for me. I can't really wrap my head around the original
intent of the "+1" change, or whether reverting the change will cause a
ripple effect somewhere else. All I can confirm is that the system will
once again balance load with this logic reverted to its previous form.
Thanks to my colleage, David Bahi, for doing all the legwork on the bisect.
And thanks to Peter Zijlstra for guiding me on all things CFS as I stuggle
to come up to speed on the non-RT portions of the scheduler.
Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
CC: David Bahi <dbahi@...ell.com>
CC: Ingo Molnar <mingo@...e.hu>
CC: Peter Zijlstra <peterz@...radead.org>
---
kernel/sched.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 32ef6c8..8326e20 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1562,7 +1562,7 @@ calc_delta_mine(unsigned long delta_exec, unsigned long weight,
if (unlikely(!lw->inv_weight)) {
unsigned long inv_wls = inv_WLS(lw->weight);
- lw->inv_weight = 1 + (WMULT_CONST-inv_wls/2) / (inv_wls+1);
+ lw->inv_weight = (WMULT_CONST-inv_wls/2) / (inv_wls+1);
}
tmp = inv_WLS((u64)delta_exec * weight);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists