[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1267817792.6384.37.camel@marge.simson.net>
Date: Fri, 05 Mar 2010 20:36:32 +0100
From: Mike Galbraith <efault@....de>
To: Suresh Siddha <suresh.b.siddha@...el.com>
Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Ingo Molnar <mingo@...e.hu>,
Arjan van de Ven <arjan@...ux.jf.intel.com>,
linux-kernel@...r.kernel.org,
Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
Yanmin Zhang <yanmin_zhang@...ux.jf.intel.com>,
Gautham R Shenoy <ego@...ibm.com>
Subject: Re: [patch 1/2] sched: check for prev_cpu == this_cpu in
wake_affine()
On Fri, 2010-03-05 at 10:39 -0800, Suresh Siddha wrote:
> plain text document attachment (fix_wake_affine.patch)
> On a single cpu system with SMT, in the scenario of one SMT thread being
> idle with another SMT thread running a task and doing a non sync wakeup of
> another task, we see (from the traces) that the woken up task ends up running
> on the busy thread instead of the idle thread. Idle balancing that comes
> in little bit later is fixing the scernaio.
Yup, wake_affine() fails for non sync wakeup when 1 task is running.
That's annoying, but making it succeed globally worries me. We need a
high quality hint, and avg_overlap ain't it unfortunately, because to
get accurate overlap info cross cpu, you have to double clock and
update_curr() overhead. We need dirt cheap.
> But fixing this wake balance and running the woken up task directly on the
> idle SMT thread improved the performance (phoronix 7zip compression workload)
> by ~9% on an atom platform.
So there is profit to be had.
> During the process wakeup, select_task_rq_fair() and wake_affine() makes
> the decision to wakeup the task either on the previous cpu that the task
> ran or the cpu that the task is currently woken up.
>
> select_task_rq_fair() also goes through to see if there are any idle siblings
> for the cpu that the task is woken up on. This is to ensure that we select
> any idle sibling rather than choose a busy cpu.
Yeah, but with the 1 task + non-sync wakeup scenario, we miss the boat
because select_idle_sibling() uses wake_affine() success as it's
enabler. I did that because I couldn't think up something else which
did not harm multiple buddy pairs. You can globally say sibling is
idle, go for it, but that _does_ cause throughput loss during ramp up.
Best alternative I've found is to only check for an idle sibling/cache
when there is exactly one task on the current cpu (ie put some faith in
load balancing), then force idle sibling selection. Also not optimal.
> In the above load scenario, it so happens that the prev_cpu (that the
> task ran before) and this_cpu (where it is woken up currently) are the same. And
> in this case, it looks like wake_affine() returns 0 and ultimately not selecting
> the idle sibling chosen by select_idle_sibling() in select_task_rq_fair().
> Further down the path of select_task_rq_fair(), we ultimately select
> the currently running cpu (busy SMT thread instead of the idle SMT thread).
>
> Check for prev_cpu == this_cpu in wake_affine() and no need to do
> any fancy stuff(and ultimately taking wrong decisions) in this case.
I have a slightly different patch for that in my tree. There's no need
to even call wake_affine() since the result is meaningless.
---
kernel/sched_fair.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
Index: linux-2.6.34.git/kernel/sched_fair.c
===================================================================
--- linux-2.6.34.git.orig/kernel/sched_fair.c
+++ linux-2.6.34.git/kernel/sched_fair.c
@@ -1547,8 +1547,14 @@ static int select_task_rq_fair(struct ta
}
#endif
- if (affine_sd && wake_affine(affine_sd, p, sync))
- return cpu;
+ if (affine_sd) {
+ if (cpu == prev_cpu)
+ return cpu;
+ if (wake_affine(affine_sd, p, sync))
+ return cpu;
+ if (!(affine_sd->flags & SD_BALANCE_WAKE))
+ return prev_cpu;
+ }
while (sd) {
int load_idx = sd->forkexec_idx;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists