[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1262602785.9734.26.camel@marge.simson.net>
Date: Mon, 04 Jan 2010 11:59:45 +0100
From: Mike Galbraith <efault@....de>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Lin Ming <ming.m.lin@...el.com>,
lkml <linux-kernel@...r.kernel.org>,
"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
Subject: Re: [RFC PATCH] sched: Pass affine target cpu into wake_affine
On Mon, 2010-01-04 at 10:32 +0100, Peter Zijlstra wrote:
> On Mon, 2010-01-04 at 17:12 +0800, Lin Ming wrote:
> > On Mon, 2010-01-04 at 17:25 +0800, Peter Zijlstra wrote:
> > > On Mon, 2010-01-04 at 17:03 +0800, Lin Ming wrote:
> > > > commit a03ecf08d7bbdd979d81163ea13d194fe21ad339
> > > > Author: Lin Ming <ming.m.lin@...el.com>
> > > > Date: Mon Jan 4 14:14:50 2010 +0800
> > > >
> > > > sched: Pass affine target cpu into wake_affine
> > > >
> > > > Since commit a1f84a3(sched: Check for an idle shared cache in select_task_rq_fair()),
> > > > the affine target maybe adjusted to any idle cpu in cache sharing domains
> > > > instead of current cpu.
> > > > But wake_affine still use current cpu to calculate load which is wrong.
> > > >
> > > > This patch passes affine cpu into wake_affine.
> > > >
> > >
> > > Does this at all help with that regression?
> >
> > No.
>
> crap :/
>
> The change does look sensible though.
I piddled with all kinds of ways to get around calling wake_affine()
entirely, and/or calling it with the affine candidate to no avail. Best
result was always to do the silly looking thing, namely test the current
cpu for wake affine decision, but slip in the shared cache cpu.
I bet the below helps, though there will still be cache misses, so there
will still be pain for extreme switchers. Question is whether the
ramp-up gain is worth it. I think yes, since it's up to 100%. Would be
most excellent to find a way to know in advance when the cost will be
too high, and then not go there. Same applies for doing the affinity
decision every time for extreme switchers. It's expensive for those,
especially so when they're pinned, but pays in the general case.
Anyway...
PREFER_SIBLING is set at the CPU domain level if you don't have power
saving set, so you get to eat cache misses for each cpu, whether it's
sharing a cache or not as you traverse. Lots of CPUs, LOTS of pain.
not-signed-off
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 57e6357..5b81156 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -99,7 +99,7 @@ int arch_update_cpu_topology(void);
| 1*SD_WAKE_AFFINE \
| 1*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
- | 0*SD_SHARE_PKG_RESOURCES \
+ | 1*SD_SHARE_PKG_RESOURCES \
| 0*SD_SERIALIZE \
| 0*SD_PREFER_SIBLING \
, \
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 42ac3c9..8fe7ee8 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1508,7 +1508,7 @@ static int select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flag
* If there's an idle sibling in this domain, make that
* the wake_affine target instead of the current cpu.
*/
- if (tmp->flags & SD_PREFER_SIBLING)
+ if (tmp->flags & SD_SHARE_PKG_RESOURCES)
target = select_idle_sibling(p, tmp, target);
if (target >= 0) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists