[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090818131558.GO29515@alberich.amd.com>
Date: Tue, 18 Aug 2009 15:15:58 +0200
From: Andreas Herrmann <andreas.herrmann3@....com>
To: Ingo Molnar <mingo@...e.hu>
CC: Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/12] cleanup __build_sched_domains()
On Tue, Aug 18, 2009 at 01:16:44PM +0200, Ingo Molnar wrote:
>
> * Andreas Herrmann <andreas.herrmann3@....com> wrote:
>
> > Hi,
> >
> > Following patches try to make __build_sched_domains() less ugly
> > and more readable. They shouldn't be harmful. Thus I think they
> > can be applied for .32.
> >
> > Patches are against tip/master as of today.
> >
> > FYI, I need those patches as a base for introducing a new domain
> > level for multi-node CPUs for which I intend to sent patches as
> > RFC asap.
>
> Very nice cleanups!
>
> Magny-Cours indeed will need one more sched-domains level,
> something like:
>
> [smt thread]
> core
> internal numa node
> cpu socket
> external numa node
My current approach is to have the numa node domain either below CPU
(in case of multi-cpu node where SRAT describes each internal node as
a NUMA node) or as is, as the top-level domain (e.g. in case of node
interleaving or missing/broken ACPI SRAT detection).
Sched domain levels (note SMT==SIBLING, NODE==NUMA) are:
(1) groups in NUMA domain are subsets of groups in CPU domain
(2) groups in NUMA domain are supersets groups in CPU domain
(1) | (2)
------------|-------------------
SMT | SMT
MC | MC
MN (new) | MN
NUMA | CPU
CPU | NUMA
I'll also introduce a new parameter sched_mn_power_savings which will
cause that tasks are scheduled on one socket until its capacity is
reached. If capacity is reached other sockets can also be occupied.
> ... which is certainly interesting, especially since the hierarchy
> possibly 'crosses', i.e. we might have the two internal numa nodes
> share a L2 or L3 cache, right?
> I'd also not be surprised if the load-balancer needed some care to
> properly handle such a setup.
It needs some care and gave me some headache to get it working in all
cases (i.e. NUMA, no-NUMA, NUMA-but-no-SRAT). My current code (that
still needs to be split in proper patches for submission) works fine
in all but one case. And I am still debugging it.
The case that is not working is a normal (non-multi-node) NUMA system
on which switching to power policy does not take effect for already
running tasks. Just the new created ones are scheduled according to
the power policy.
> It's all welcome work in any case, and for .32.
Thanks,
Andreas
--
Operating | Advanced Micro Devices GmbH
System | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists