[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <25935.1271304912@neuling.org>
Date: Thu, 15 Apr 2010 14:15:12 +1000
From: Michael Neuling <mikey@...ling.org>
To: Peter Zijlstra <peterz@...radead.org>
cc: Benjamin Herrenschmidt <benh@...nel.crashing.org>,
linuxppc-dev@...abs.org, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...e.hu>,
Suresh Siddha <suresh.b.siddha@...el.com>,
Gautham R Shenoy <ego@...ibm.com>
Subject: Re: [PATCH 4/5] sched: Mark the balance type for use in need_active_balance()
> On Fri, 2010-04-09 at 16:21 +1000, Michael Neuling wrote:
> > need_active_balance() gates the asymmetric packing based due to power
> > save logic, but for packing we don't care.
>
> This explanation lacks a how/why.
>
> So the problem is that need_active_balance() ends up returning false and
> prevents the active balance from pulling a task to a lower available SMT
> sibling?
Correct. I've put a more detailed description in the patch below.
> > This marks the type of balanace we are attempting to do perform from
> > f_b_g() and stops need_active_balance() power save logic gating a
> > balance in the asymmetric packing case.
>
> At the very least this wants more comments in the code.
Sorry again for the lack luster comments. I've updated this patch also.
> I'm not really charmed by having to add yet another variable to pass
> around that mess, but I can't seem to come up with something cleaner
> either.
Yeah, the current case only ever reads the balance type in the !=
BALANCE_POWER so a full enum might be overkill, but I though it might
come in useful for someone else.
Updated patch below.
Mikey
[PATCH 4/5] sched: fix need_active_balance() from preventing asymmetric packing
need_active_balance() prevents a task being pulled onto a newly idle
package in an attempt to completely free it so it can be powered down.
Hence it returns false to load_balance() and prevents the active
balance from occurring.
Unfortunately, when asymmetric packing is enabled at the sibling level
this power save logic is preventing the packing balance from moving a
task to a lower idle thread. At the sibling level SD_SHARE_CPUPOWER
and parent(SD_POWERSAVINGS_BALANCE) are enabled and the domain is also
non-idle (since we have at least 1 task we are trying to move down).
Hence the following code, prevents the an active balance from
occurring:
if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
!test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
return 0;
To fix this, this patch classifies the type of balance we are
attempting to perform into none, load, power and packing based on what
function finds busiest in f_b_g(). This classification is then used
by need_active_balance() to prevent the above power saving logic from
stopping a balance due to asymmetric packing. This ensures tasks can
be correctly moved down to lower sibling threads.
Signed-off-by: Michael Neuling <mikey@...ling.org>
---
kernel/sched_fair.c | 35 ++++++++++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 5 deletions(-)
Index: linux-2.6-ozlabs/kernel/sched_fair.c
===================================================================
--- linux-2.6-ozlabs.orig/kernel/sched_fair.c
+++ linux-2.6-ozlabs/kernel/sched_fair.c
@@ -91,6 +91,14 @@ const_debug unsigned int sysctl_sched_mi
static const struct sched_class fair_sched_class;
+/* Enum to classify the type of balance we are attempting to perform */
+enum balance_type {
+ BALANCE_NONE = 0,
+ BALANCE_LOAD,
+ BALANCE_POWER,
+ BALANCE_PACKING
+};
+
/**************************************************************
* CFS operations on generic schedulable entities:
*/
@@ -2803,16 +2811,19 @@ static inline void calculate_imbalance(s
* @cpus: The set of CPUs under consideration for load-balancing.
* @balance: Pointer to a variable indicating if this_cpu
* is the appropriate cpu to perform load balancing at this_level.
+ * @bt: returns the type of imbalance found
*
* Returns: - the busiest group if imbalance exists.
* - If no imbalance and user has opted for power-savings balance,
* return the least loaded group whose CPUs can be
* put to idle by rebalancing its tasks onto our group.
+ * - *bt classifies the type of imbalance found
*/
static struct sched_group *
find_busiest_group(struct sched_domain *sd, int this_cpu,
unsigned long *imbalance, enum cpu_idle_type idle,
- int *sd_idle, const struct cpumask *cpus, int *balance)
+ int *sd_idle, const struct cpumask *cpus, int *balance,
+ enum balance_type *bt)
{
struct sd_lb_stats sds;
@@ -2837,6 +2848,7 @@ find_busiest_group(struct sched_domain *
if (!(*balance))
goto ret;
+ *bt = BALANCE_PACKING;
if ((idle == CPU_IDLE || idle == CPU_NEWLY_IDLE) &&
check_asym_packing(sd, &sds, this_cpu, imbalance))
return sds.busiest;
@@ -2857,6 +2869,7 @@ find_busiest_group(struct sched_domain *
/* Looks like there is an imbalance. Compute it */
calculate_imbalance(&sds, this_cpu, imbalance);
+ *bt = BALANCE_LOAD;
return sds.busiest;
out_balanced:
@@ -2864,10 +2877,12 @@ out_balanced:
* There is no obvious imbalance. But check if we can do some balancing
* to save power.
*/
+ *bt = BALANCE_POWER;
if (check_power_save_busiest_group(&sds, this_cpu, imbalance))
return sds.busiest;
ret:
*imbalance = 0;
+ *bt = BALANCE_NONE;
return NULL;
}
@@ -2928,9 +2943,18 @@ find_busiest_queue(struct sched_group *g
/* Working cpumask for load_balance and load_balance_newidle. */
static DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
-static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle)
+static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle,
+ enum balance_type *bt)
{
- if (idle == CPU_NEWLY_IDLE) {
+ /*
+ * The powersave code will stop a task being moved in an
+ * attempt to freeup CPU package wich could be powered
+ * down. In the case where we are attempting to balance due to
+ * asymmetric packing at the sibling level, we don't care
+ * about power save. Hence prevent powersave stopping a
+ * balance trigged by packing.
+ */
+ if (idle == CPU_NEWLY_IDLE && *bt != BALANCE_PACKING) {
/*
* The only task running in a non-idle cpu can be moved to this
* cpu in an attempt to completely freeup the other CPU
@@ -2975,6 +2999,7 @@ static int load_balance(int this_cpu, st
struct rq *busiest;
unsigned long flags;
struct cpumask *cpus = __get_cpu_var(load_balance_tmpmask);
+ enum balance_type bt;
cpumask_copy(cpus, cpu_active_mask);
@@ -2993,7 +3018,7 @@ static int load_balance(int this_cpu, st
redo:
update_shares(sd);
group = find_busiest_group(sd, this_cpu, &imbalance, idle, &sd_idle,
- cpus, balance);
+ cpus, balance, &bt);
if (*balance == 0)
goto out_balanced;
@@ -3047,7 +3072,7 @@ redo:
schedstat_inc(sd, lb_failed[idle]);
sd->nr_balance_failed++;
- if (need_active_balance(sd, sd_idle, idle)) {
+ if (need_active_balance(sd, sd_idle, idle, &bt)) {
raw_spin_lock_irqsave(&busiest->lock, flags);
/* don't kick the migration_thread, if the curr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists