[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <528268C8.8010501@arm.com>
Date: Tue, 12 Nov 2013 17:43:36 +0000
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: Peter Zijlstra <peterz@...radead.org>,
Martin Schwidefsky <schwidefsky@...ibm.com>
CC: Vincent Guittot <vincent.guittot@...aro.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>, Paul Turner <pjt@...gle.com>,
Morten Rasmussen <Morten.Rasmussen@....com>,
"cmetcalf@...era.com" <cmetcalf@...era.com>,
"tony.luck@...el.com" <tony.luck@...el.com>,
Alex Shi <alex.shi@...el.com>,
Preeti U Murthy <preeti@...ux.vnet.ibm.com>,
"linaro-kernel@...ts.linaro.org" <linaro-kernel@...ts.linaro.org>,
"Rafael J. Wysocki" <rjw@...k.pl>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Jonathan Corbet <corbet@....net>,
Thomas Gleixner <tglx@...utronix.de>,
Len Brown <len.brown@...el.com>,
Arjan van de Ven <arjan@...ux.intel.com>,
Amit Kucheria <amit.kucheria@...aro.org>,
Lukasz Majewski <l.majewski@...sung.com>,
"james.hogan@...tec.com" <james.hogan@...tec.com>,
"heiko.carstens@...ibm.com" <heiko.carstens@...ibm.com>
Subject: Re: [RFC][PATCH v5 01/14] sched: add a new arch_sd_local_flags for
sched_domain init
On 06/11/13 14:08, Peter Zijlstra wrote:
> On Wed, Nov 06, 2013 at 02:53:44PM +0100, Martin Schwidefsky wrote:
>> On Tue, 5 Nov 2013 23:27:52 +0100
>> Peter Zijlstra <peterz@...radead.org> wrote:
>>
>>> On Tue, Nov 05, 2013 at 03:57:23PM +0100, Vincent Guittot wrote:
>>>> Your proposal looks fine for me. It's clearly better to move in one
>>>> place the configuration of sched_domain fields. Have you already got
>>>> an idea about how to let architecture override the topology?
>>>
>>> Maybe something like the below -- completely untested (my s390 compiler
>>> is on a machine that's currently powered off).
>>
>> In principle I do not see a reason why this should not work, but there
>> are a few more things to take care of. E.g. struct sd_data is defined
>> in kernel/sched/core.c, cpu_cpu_mask as well. These need to be moved
>> to a header where arch/s390/kernel/smp.c can pick it up.
>>
>> I do have the feeling that the sched_domain_topology should be left
>> where they are, or do we really want to expose more of the scheduler
>> internals?
>
> Ah, its a trade off; in that previous patch I removed the entire
> sched_domain initializers the archs used to 'have' to fill out. That
> exposed far too much behavioural stuff the archs really shouldn't
> bother with.
>
> In return we now provide a (hopefully) simpler interface that allows
> archs to communicate their topology to the scheduler -- without getting
> mixed up in the behavioural aspects (too much).
>
> Maybe s390 wasn't the best example to pick, as the book domain really
> isn't that exciting. Arguably I should have taken Power7+ and the
> ASYM_PACKING SMT thing.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
We actually don't have to expose sched_domain_topology or any internal
scheduler data structures.
We still can get rid of the SD_XXX_INIT stuff and do the sched_domain
initialization for all levels in one function sd_init().
Moreover, we could introduce a arch specific general function replacing
arch specific functions for particular flags and levels like
arch_sd_sibling_asym_packing() or Vincent's arch_sd_local_flags().
This arch specific general function exposes the level and the
sched_domain pointer to the arch which then could fine tune sched_domain
in each individual level.
Below is a patch which bases on your idea to transform sd_numa_init()
into sd_init(). The main difference is that I don't try to distinguish
based of power management related flags inside sd_init() but rather on
the new sd level data.
Dietmar
----8<----
>From 3df278ad50690a7878c9cc6b18e226805e1f4bd1 Mon Sep 17 00:00:00 2001
From: Dietmar Eggemann <dietmar.eggemann@....com>
Date: Tue, 12 Nov 2013 12:37:36 +0000
Subject: [PATCH] sched: rework sched_domain setup code
This patch removes the sched_domain initializer macros
SD_[SIBLING|MC|BOOK|CPU]_INIT in core.c and in archs and replaces them
with calls to the new function sd_init(). The function sd_init
incorporates the already existing function sd_numa_init().
It introduces preprocessor constants (SD_LVL_[INV|SMT|MC|BOOK|CPU|NUMA])
and replaces 'sched_domain_init_f init' with 'int level' data member in
struct sched_domain_topology_level.
The new data member is used to distinguish the sched_domain level in
sd_init() and is also passed as an argument to the arch specific
function to tweak the sched_domain described below.
To make it still possible for archs to tweak the individual
sched_domain level, a new weak function arch_sd_customize(int level,
struct sched_domain *sd, int cpu) is introduced.
By exposing the sched_domain level and the pointer to the sched_domain
data structure, the archs can tweak individual data members, like the
min or max interval or the flags. This function also replaces the
existing function arch_sd_sibiling_asym_packing() which is specialized
in setting the SD_ASYM_PACKING flag for the SMT sched_domain level.
The parameter cpu is currently not used but could be used in the
future to setup sched_domain structures in one sched_domain level
differently for different cpus.
Initialization of a sched_domain is done in three steps. First, at the
beginning of sd_init(), the sched_domain data members are set which
have the same value for all or at least most of the sched_domain
levels. Second, sched_domain data members are set for each
sched_domain level individually in sd_init(). Third,
arch_sd_customize() is called in sd_init().
One exception is SD_NODE_INIT which this patch removes from
arch/metag/include/asm/topology.h. I don't now how it's been used so
this patch does not provide a metag specific arch_sd_customize()
implementation.
This patch has been tested on ARM TC2 (5 CPUs, sched_domain level MC
and CPU) and compile-tested for x86_64, powerpc (chroma_defconfig) and
mips (ip27_defconfig).
It is against v3.12 .
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@....com>
---
arch/ia64/include/asm/topology.h | 24 -----
arch/ia64/kernel/topology.c | 8 ++
arch/metag/include/asm/topology.h | 25 -----
arch/powerpc/kernel/smp.c | 7 +-
arch/tile/include/asm/topology.h | 33 ------
arch/tile/kernel/smp.c | 12 +++
include/linux/sched.h | 8 +-
include/linux/topology.h | 109 -------------------
kernel/sched/core.c | 214 +++++++++++++++++++++----------------
9 files changed, 150 insertions(+), 290 deletions(-)
diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
index a2496e4..20d12fa 100644
--- a/arch/ia64/include/asm/topology.h
+++ b/arch/ia64/include/asm/topology.h
@@ -46,30 +46,6 @@
void build_cpu_to_node_map(void);
-#define SD_CPU_INIT (struct sched_domain) { \
- .parent = NULL, \
- .child = NULL, \
- .groups = NULL, \
- .min_interval = 1, \
- .max_interval = 4, \
- .busy_factor = 64, \
- .imbalance_pct = 125, \
- .cache_nice_tries = 2, \
- .busy_idx = 2, \
- .idle_idx = 1, \
- .newidle_idx = 0, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
- .flags = SD_LOAD_BALANCE \
- | SD_BALANCE_NEWIDLE \
- | SD_BALANCE_EXEC \
- | SD_BALANCE_FORK \
- | SD_WAKE_AFFINE, \
- .last_balance = jiffies, \
- .balance_interval = 1, \
- .nr_balance_failed = 0, \
-}
-
#endif /* CONFIG_NUMA */
#ifdef CONFIG_SMP
diff --git a/arch/ia64/kernel/topology.c b/arch/ia64/kernel/topology.c
index ca69a5a..5dd627d 100644
--- a/arch/ia64/kernel/topology.c
+++ b/arch/ia64/kernel/topology.c
@@ -99,6 +99,14 @@ out:
subsys_initcall(topology_init);
+void arch_sd_customize(int level, struct sched_domain *sd, int cpu)
+{
+ if (level == SD_LVL_CPU) {
+ sd->cache_nice_tries = 2;
+
+ sd->flags &= ~SD_PREFER_SIBLING;
+ }
+}
/*
* Export cpu cache information through sysfs
diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h
index 23f5118..e95f874 100644
--- a/arch/metag/include/asm/topology.h
+++ b/arch/metag/include/asm/topology.h
@@ -3,31 +3,6 @@
#ifdef CONFIG_NUMA
-/* sched_domains SD_NODE_INIT for Meta machines */
-#define SD_NODE_INIT (struct sched_domain) { \
- .parent = NULL, \
- .child = NULL, \
- .groups = NULL, \
- .min_interval = 8, \
- .max_interval = 32, \
- .busy_factor = 32, \
- .imbalance_pct = 125, \
- .cache_nice_tries = 2, \
- .busy_idx = 3, \
- .idle_idx = 2, \
- .newidle_idx = 0, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
- .flags = SD_LOAD_BALANCE \
- | SD_BALANCE_FORK \
- | SD_BALANCE_EXEC \
- | SD_BALANCE_NEWIDLE \
- | SD_SERIALIZE, \
- .last_balance = jiffies, \
- .balance_interval = 1, \
- .nr_balance_failed = 0, \
-}
-
#define cpu_to_node(cpu) ((void)(cpu), 0)
#define parent_node(node) ((void)(node), 0)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8e59abc..9ac5bfb 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -802,13 +802,12 @@ void __init smp_cpus_done(unsigned int max_cpus)
}
-int arch_sd_sibling_asym_packing(void)
+void arch_sd_customize(int level, struct sched_domain *sd, int cpu)
{
- if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
+ if (level == SD_LVL_SMT && cpu_has_feature(CPU_FTR_ASYM_SMT)) {
printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
- return SD_ASYM_PACKING;
+ sd->flags |= SD_ASYM_PACKING;
}
- return 0;
}
#ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index d15c0d8..9383118 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -44,39 +44,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
/* For now, use numa node -1 for global allocation. */
#define pcibus_to_node(bus) ((void)(bus), -1)
-/*
- * TILE architecture has many cores integrated in one processor, so we need
- * setup bigger balance_interval for both CPU/NODE scheduling domains to
- * reduce process scheduling costs.
- */
-
-/* sched_domains SD_CPU_INIT for TILE architecture */
-#define SD_CPU_INIT (struct sched_domain) { \
- .min_interval = 4, \
- .max_interval = 128, \
- .busy_factor = 64, \
- .imbalance_pct = 125, \
- .cache_nice_tries = 1, \
- .busy_idx = 2, \
- .idle_idx = 1, \
- .newidle_idx = 0, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
- \
- .flags = 1*SD_LOAD_BALANCE \
- | 1*SD_BALANCE_NEWIDLE \
- | 1*SD_BALANCE_EXEC \
- | 1*SD_BALANCE_FORK \
- | 0*SD_BALANCE_WAKE \
- | 0*SD_WAKE_AFFINE \
- | 0*SD_SHARE_CPUPOWER \
- | 0*SD_SHARE_PKG_RESOURCES \
- | 0*SD_SERIALIZE \
- , \
- .last_balance = jiffies, \
- .balance_interval = 32, \
-}
-
/* By definition, we create nodes based on online memory. */
#define node_has_online_mem(nid) 1
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index 01e8ab2..dfafe55 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -254,3 +254,15 @@ void smp_send_reschedule(int cpu)
}
#endif /* CHIP_HAS_IPI() */
+
+void arch_sd_customize(int level, struct sched_domain *sd, int cpu)
+{
+ if (level == SD_LVL_CPU) {
+ sd->min_interval = 4;
+ sd->max_interval = 128;
+
+ sd->flags &= ~(SD_WAKE_AFFINE | SD_PREFER_SIBLING);
+
+ sd->balance_interval = 32;
+ }
+}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e27baee..847485d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -769,7 +769,13 @@ enum cpu_idle_type {
#define SD_PREFER_SIBLING 0x1000 /* Prefer to place tasks in a sibling domain */
#define SD_OVERLAP 0x2000 /* sched_domains of this level overlap */
-extern int __weak arch_sd_sibiling_asym_packing(void);
+/* sched-domain levels */
+#define SD_LVL_INV 0x00 /* invalid */
+#define SD_LVL_SMT 0x01
+#define SD_LVL_MC 0x02
+#define SD_LVL_BOOK 0x04
+#define SD_LVL_CPU 0x08
+#define SD_LVL_NUMA 0x10
struct sched_domain_attr {
int relax_domain_level;
diff --git a/include/linux/topology.h b/include/linux/topology.h
index d3cf0d6..02a397a 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -66,115 +66,6 @@ int arch_update_cpu_topology(void);
#define PENALTY_FOR_NODE_WITH_CPUS (1)
#endif
-/*
- * Below are the 3 major initializers used in building sched_domains:
- * SD_SIBLING_INIT, for SMT domains
- * SD_CPU_INIT, for SMP domains
- *
- * Any architecture that cares to do any tuning to these values should do so
- * by defining their own arch-specific initializer in include/asm/topology.h.
- * A definition there will automagically override these default initializers
- * and allow arch-specific performance tuning of sched_domains.
- * (Only non-zero and non-null fields need be specified.)
- */
-
-#ifdef CONFIG_SCHED_SMT
-/* MCD - Do we really need this? It is always on if CONFIG_SCHED_SMT is,
- * so can't we drop this in favor of CONFIG_SCHED_SMT?
- */
-#define ARCH_HAS_SCHED_WAKE_IDLE
-/* Common values for SMT siblings */
-#ifndef SD_SIBLING_INIT
-#define SD_SIBLING_INIT (struct sched_domain) { \
- .min_interval = 1, \
- .max_interval = 2, \
- .busy_factor = 64, \
- .imbalance_pct = 110, \
- \
- .flags = 1*SD_LOAD_BALANCE \
- | 1*SD_BALANCE_NEWIDLE \
- | 1*SD_BALANCE_EXEC \
- | 1*SD_BALANCE_FORK \
- | 0*SD_BALANCE_WAKE \
- | 1*SD_WAKE_AFFINE \
- | 1*SD_SHARE_CPUPOWER \
- | 1*SD_SHARE_PKG_RESOURCES \
- | 0*SD_SERIALIZE \
- | 0*SD_PREFER_SIBLING \
- | arch_sd_sibling_asym_packing() \
- , \
- .last_balance = jiffies, \
- .balance_interval = 1, \
- .smt_gain = 1178, /* 15% */ \
-}
-#endif
-#endif /* CONFIG_SCHED_SMT */
-
-#ifdef CONFIG_SCHED_MC
-/* Common values for MC siblings. for now mostly derived from SD_CPU_INIT */
-#ifndef SD_MC_INIT
-#define SD_MC_INIT (struct sched_domain) { \
- .min_interval = 1, \
- .max_interval = 4, \
- .busy_factor = 64, \
- .imbalance_pct = 125, \
- .cache_nice_tries = 1, \
- .busy_idx = 2, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
- \
- .flags = 1*SD_LOAD_BALANCE \
- | 1*SD_BALANCE_NEWIDLE \
- | 1*SD_BALANCE_EXEC \
- | 1*SD_BALANCE_FORK \
- | 0*SD_BALANCE_WAKE \
- | 1*SD_WAKE_AFFINE \
- | 0*SD_SHARE_CPUPOWER \
- | 1*SD_SHARE_PKG_RESOURCES \
- | 0*SD_SERIALIZE \
- , \
- .last_balance = jiffies, \
- .balance_interval = 1, \
-}
-#endif
-#endif /* CONFIG_SCHED_MC */
-
-/* Common values for CPUs */
-#ifndef SD_CPU_INIT
-#define SD_CPU_INIT (struct sched_domain) { \
- .min_interval = 1, \
- .max_interval = 4, \
- .busy_factor = 64, \
- .imbalance_pct = 125, \
- .cache_nice_tries = 1, \
- .busy_idx = 2, \
- .idle_idx = 1, \
- .newidle_idx = 0, \
- .wake_idx = 0, \
- .forkexec_idx = 0, \
- \
- .flags = 1*SD_LOAD_BALANCE \
- | 1*SD_BALANCE_NEWIDLE \
- | 1*SD_BALANCE_EXEC \
- | 1*SD_BALANCE_FORK \
- | 0*SD_BALANCE_WAKE \
- | 1*SD_WAKE_AFFINE \
- | 0*SD_SHARE_CPUPOWER \
- | 0*SD_SHARE_PKG_RESOURCES \
- | 0*SD_SERIALIZE \
- | 1*SD_PREFER_SIBLING \
- , \
- .last_balance = jiffies, \
- .balance_interval = 1, \
-}
-#endif
-
-#ifdef CONFIG_SCHED_BOOK
-#ifndef SD_BOOK_INIT
-#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
-#endif
-#endif /* CONFIG_SCHED_BOOK */
-
#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
DECLARE_PER_CPU(int, numa_node);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5ac63c9..53eda22 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5225,13 +5225,12 @@ enum s_alloc {
struct sched_domain_topology_level;
-typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
#define SDTL_OVERLAP 0x01
struct sched_domain_topology_level {
- sched_domain_init_f init;
+ int level;
sched_domain_mask_f mask;
int flags;
int numa_level;
@@ -5455,9 +5454,8 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
}
-int __weak arch_sd_sibling_asym_packing(void)
+void __weak arch_sd_customize(int level, struct sched_domain *sd, int cpu)
{
- return 0*SD_ASYM_PACKING;
}
/*
@@ -5471,28 +5469,6 @@ int __weak arch_sd_sibling_asym_packing(void)
# define SD_INIT_NAME(sd, type) do { } while (0)
#endif
-#define SD_INIT_FUNC(type) \
-static noinline struct sched_domain * \
-sd_init_##type(struct sched_domain_topology_level *tl, int cpu) \
-{ \
- struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu); \
- *sd = SD_##type##_INIT; \
- SD_INIT_NAME(sd, type); \
- sd->private = &tl->data; \
- return sd; \
-}
-
-SD_INIT_FUNC(CPU)
-#ifdef CONFIG_SCHED_SMT
- SD_INIT_FUNC(SIBLING)
-#endif
-#ifdef CONFIG_SCHED_MC
- SD_INIT_FUNC(MC)
-#endif
-#ifdef CONFIG_SCHED_BOOK
- SD_INIT_FUNC(BOOK)
-#endif
-
static int default_relax_domain_level = -1;
int sched_domain_level_max;
@@ -5587,89 +5563,140 @@ static const struct cpumask *cpu_smt_mask(int cpu)
}
#endif
-/*
- * Topology list, bottom-up.
- */
-static struct sched_domain_topology_level default_topology[] = {
+#ifdef CONFIG_NUMA
+static int sched_domains_numa_levels;
+static int *sched_domains_numa_distance;
+static struct cpumask ***sched_domains_numa_masks;
+static int sched_domains_curr_level;
+#endif
+
+static struct sched_domain *
+sd_init(struct sched_domain_topology_level *tl, int cpu)
+{
+ struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
+#ifdef CONFIG_NUMA
+ int sd_weight;
+#endif
+
+ *sd = (struct sched_domain) {
+ .min_interval = 1,
+ .max_interval = 4,
+ .busy_factor = 64,
+ .imbalance_pct = 125,
+
+ .flags = 1*SD_LOAD_BALANCE
+ | 1*SD_BALANCE_NEWIDLE
+ | 1*SD_BALANCE_EXEC
+ | 1*SD_BALANCE_FORK
+ | 0*SD_BALANCE_WAKE
+ | 1*SD_WAKE_AFFINE
+ | 0*SD_SHARE_CPUPOWER
+ | 0*SD_SHARE_PKG_RESOURCES
+ | 0*SD_SERIALIZE
+ | 0*SD_PREFER_SIBLING
+ ,
+
+ .last_balance = jiffies,
+ .balance_interval = 1,
+ };
+
+ switch (tl->level) {
#ifdef CONFIG_SCHED_SMT
- { sd_init_SIBLING, cpu_smt_mask, },
+ case SD_LVL_SMT:
+ sd->max_interval = 2;
+ sd->imbalance_pct = 110;
+
+ sd->flags |= SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
+
+ sd->smt_gain = 1178; /* ~15% */
+
+ SD_INIT_NAME(sd, SMT);
+ break;
#endif
#ifdef CONFIG_SCHED_MC
- { sd_init_MC, cpu_coregroup_mask, },
+ case SD_LVL_MC:
+ sd->cache_nice_tries = 1;
+ sd->busy_idx = 2;
+
+ sd->flags |= SD_SHARE_PKG_RESOURCES;
+
+ SD_INIT_NAME(sd, MC);
+ break;
#endif
+ case SD_LVL_CPU:
#ifdef CONFIG_SCHED_BOOK
- { sd_init_BOOK, cpu_book_mask, },
+ case SD_LVL_BOOK:
#endif
- { sd_init_CPU, cpu_cpu_mask, },
- { NULL, },
-};
+ sd->cache_nice_tries = 1;
+ sd->busy_idx = 2;
+ sd->idle_idx = 1;
-static struct sched_domain_topology_level *sched_domain_topology = default_topology;
-
-#define for_each_sd_topology(tl) \
- for (tl = sched_domain_topology; tl->init; tl++)
+ sd->flags |= SD_PREFER_SIBLING;
+ SD_INIT_NAME(sd, CPU);
+ break;
#ifdef CONFIG_NUMA
+ case SD_LVL_NUMA:
+ sd_weight = cpumask_weight(sched_domains_numa_masks
+ [tl->numa_level][cpu_to_node(cpu)]);
-static int sched_domains_numa_levels;
-static int *sched_domains_numa_distance;
-static struct cpumask ***sched_domains_numa_masks;
-static int sched_domains_curr_level;
+ sd->min_interval = sd_weight;
+ sd->max_interval = 2*sd_weight;
+ sd->busy_factor = 32;
-static inline int sd_local_flags(int level)
-{
- if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
- return 0;
+ sd->cache_nice_tries = 2;
+ sd->busy_idx = 3;
+ sd->idle_idx = 2;
- return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
-}
+ sd->flags |= SD_SERIALIZE;
-static struct sched_domain *
-sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
-{
- struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
- int level = tl->numa_level;
- int sd_weight = cpumask_weight(
- sched_domains_numa_masks[level][cpu_to_node(cpu)]);
-
- *sd = (struct sched_domain){
- .min_interval = sd_weight,
- .max_interval = 2*sd_weight,
- .busy_factor = 32,
- .imbalance_pct = 125,
- .cache_nice_tries = 2,
- .busy_idx = 3,
- .idle_idx = 2,
- .newidle_idx = 0,
- .wake_idx = 0,
- .forkexec_idx = 0,
-
- .flags = 1*SD_LOAD_BALANCE
- | 1*SD_BALANCE_NEWIDLE
- | 0*SD_BALANCE_EXEC
- | 0*SD_BALANCE_FORK
- | 0*SD_BALANCE_WAKE
- | 0*SD_WAKE_AFFINE
- | 0*SD_SHARE_CPUPOWER
- | 0*SD_SHARE_PKG_RESOURCES
- | 1*SD_SERIALIZE
- | 0*SD_PREFER_SIBLING
- | sd_local_flags(level)
- ,
- .last_balance = jiffies,
- .balance_interval = sd_weight,
- };
- SD_INIT_NAME(sd, NUMA);
- sd->private = &tl->data;
+ if (sched_domains_numa_distance[tl->numa_level] >
+ RECLAIM_DISTANCE)
+ sd->flags &= ~(SD_BALANCE_EXEC | SD_BALANCE_FORK |
+ SD_WAKE_AFFINE);
- /*
- * Ugly hack to pass state to sd_numa_mask()...
- */
- sched_domains_curr_level = tl->numa_level;
+ sd->balance_interval = sd_weight;
+
+ /*
+ * Ugly hack to pass state to sd_numa_mask()...
+ */
+ sched_domains_curr_level = tl->numa_level;
+
+ SD_INIT_NAME(sd, NUMA);
+ break;
+#endif
+ }
+ arch_sd_customize(tl->level, sd, cpu);
+ sd->private = &tl->data;
return sd;
}
+/*
+ * Topology list, bottom-up.
+ */
+static struct sched_domain_topology_level default_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+ { SD_LVL_SMT, cpu_smt_mask },
+#endif
+#ifdef CONFIG_SCHED_MC
+ { SD_LVL_MC, cpu_coregroup_mask },
+#endif
+#ifdef CONFIG_SCHED_BOOK
+ { SD_LVL_BOOK, cpu_book_mask },
+#endif
+ { SD_LVL_CPU, cpu_cpu_mask },
+ { SD_LVL_INV, },
+};
+
+static struct sched_domain_topology_level *sched_domain_topology =
+ default_topology;
+
+#define for_each_sd_topology(tl) \
+ for (tl = sched_domain_topology; tl->level; tl++)
+
+#ifdef CONFIG_NUMA
+
static const struct cpumask *sd_numa_mask(int cpu)
{
return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
@@ -5821,7 +5848,7 @@ static void sched_init_numa(void)
/*
* Copy the default topology bits..
*/
- for (i = 0; default_topology[i].init; i++)
+ for (i = 0; default_topology[i].level; i++)
tl[i] = default_topology[i];
/*
@@ -5829,7 +5856,6 @@ static void sched_init_numa(void)
*/
for (j = 0; j < level; i++, j++) {
tl[i] = (struct sched_domain_topology_level){
- .init = sd_numa_init,
.mask = sd_numa_mask,
.flags = SDTL_OVERLAP,
.numa_level = j,
@@ -5990,7 +6016,7 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
const struct cpumask *cpu_map, struct sched_domain_attr *attr,
struct sched_domain *child, int cpu)
{
- struct sched_domain *sd = tl->init(tl, cpu);
+ struct sched_domain *sd = sd_init(tl, cpu);
if (!sd)
return child;
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists