[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1338307528.26856.106.camel@twins>
Date: Tue, 29 May 2012 18:05:28 +0200
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: Andrea Arcangeli <aarcange@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Hillf Danton <dhillf@...il.com>, Dan Smith <danms@...ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
Suresh Siddha <suresh.b.siddha@...el.com>,
Mike Galbraith <efault@....de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Bharata B Rao <bharata.rao@...il.com>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Rik van Riel <riel@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
Christoph Lameter <cl@...ux.com>
Subject: Re: [PATCH 21/35] autonuma: teach CFS about autonuma affinity
On Fri, 2012-05-25 at 19:02 +0200, Andrea Arcangeli wrote:
> The CFS scheduler is still in charge of all scheduling
> decisions. AutoNUMA balancing at times will override those. But
> generally we'll just relay on the CFS scheduler to keep doing its
> thing, but while preferring the autonuma affine nodes when deciding
> to move a process to a different runqueue or when waking it up.
>
> For example the idle balancing, will look into the runqueues of the
> busy CPUs, but it'll search first for a task that wants to run into
> the idle CPU in AutoNUMA terms (task_autonuma_cpu() being true).
>
> Most of this is encoded in the can_migrate_task becoming AutoNUMA
> aware and running two passes for each balancing pass, the first NUMA
> aware, and the second one relaxed.
>
> The idle/newidle balancing is always allowed to fallback into
> non-affine AutoNUMA tasks. The load_balancing (which is more a
> fariness than a performance issue) is instead only able to cross over
> the AutoNUMA affinity if the flag controlled by
> /sys/kernel/mm/autonuma/scheduler/load_balance_strict is not set (it
> is set by default).
This is unacceptable, and contradicts your earlier claim that you rely
on the regular load-balancer.
The strict mode needs to go, load-balancing is a best effort and
fairness is important -- so much so to some people that I get complaints
the current thing isn't strong enough.
Your strict mode basically supplants any and all balancing done at node
level and above.
Please use something like:
https://lkml.org/lkml/2012/5/19/53
with the sched_setnode() function from:
https://lkml.org/lkml/2012/5/18/109
Fairness matters because people expect similar throughput or runtimes so
balancing such that we first ensure equal load on cpus and only then
bother with node placement should be the order.
Furthermore, load-balancing does things like trying to place tasks that
wake each-other closer together, your strict mode completely breaks
that. Instead, if the balancer finds these tasks are related and should
be together that should be a hint the memory needs to come to them, not
the other way around.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists