lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAKfTPtC1_t78hvOw9u-t1BqUZj_dO3WFYp2OjX=RFZP-A7zSbg@mail.gmail.com> Date: Thu, 6 Mar 2014 16:32:35 +0800 From: Vincent Guittot <vincent.guittot@...aro.org> To: Dietmar Eggemann <dietmar.eggemann@....com> Cc: "peterz@...radead.org" <peterz@...radead.org>, "mingo@...nel.org" <mingo@...nel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "tony.luck@...el.com" <tony.luck@...el.com>, "fenghua.yu@...el.com" <fenghua.yu@...el.com>, "schwidefsky@...ibm.com" <schwidefsky@...ibm.com>, "james.hogan@...tec.com" <james.hogan@...tec.com>, "cmetcalf@...era.com" <cmetcalf@...era.com>, "benh@...nel.crashing.org" <benh@...nel.crashing.org>, "linux@....linux.org.uk" <linux@....linux.org.uk>, "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, "preeti@...ux.vnet.ibm.com" <preeti@...ux.vnet.ibm.com>, "linaro-kernel@...ts.linaro.org" <linaro-kernel@...ts.linaro.org> Subject: Re: [PATCH 2/6] sched: rework of sched_domain topology definition On 6 March 2014 01:09, Dietmar Eggemann <dietmar.eggemann@....com> wrote: > On 05/03/14 07:18, Vincent Guittot wrote: >> >> We replace the old way to configure the scheduler topology with a new >> method >> which enables a platform to declare additionnal level (if needed). >> >> We still have a default topology table definition that can be used by >> platform >> that don't want more level than the SMT, MC, CPU and NUMA ones. This table >> can >> be overwritten by an arch which wants to add new level where a load >> balance >> make sense like BOOK or powergating level. >> >> For each level, we need a function pointer that returns cpumask for each >> cpu, >> the flags configuration and a name. Each level must be a subset on > > > Maybe it's worth mentioning here that those flags are from the set of the sd > topology flags to distinguish them from the set of sd behavioural flags. > Latter can't be set via this interface. yes, i will add the list of flags that can be set with the table > > >> the next one. The build sequence of the sched_domain will take care of >> removing useless levels like those with 1 CPU and those with the same CPU >> span >> and relevant information for load balancing than its child . >> >> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org> >> --- >> arch/ia64/include/asm/topology.h | 24 ---- >> arch/s390/include/asm/topology.h | 2 - >> arch/tile/include/asm/topology.h | 33 ------ >> include/linux/sched.h | 29 +++++ >> include/linux/topology.h | 128 +++------------------ >> kernel/sched/core.c | 227 >> +++++++++++++++++++------------------- >> 6 files changed, 156 insertions(+), 287 deletions(-) >> [snip] >> - >> -#define for_each_sd_topology(tl) \ >> - for (tl = sched_domain_topology; tl->init; tl++) > > > Why is sched_domains_curr_level now outside #ifdef CONFIG_NUMA? it should not as well as its use in sd_init > > >> +static int sched_domains_curr_level; >> >> #ifdef CONFIG_NUMA >> - >> static int sched_domains_numa_levels; >> static int *sched_domains_numa_distance; >> static struct cpumask ***sched_domains_numa_masks; >> -static int sched_domains_curr_level; >> - >> -static inline int sd_local_flags(int level) >> -{ >> - if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE) >> - return 0; >> +#endif >> >> - return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE; >> -} >> +/* >> + * SD_flags allowed in topology descriptions. >> + * >> + * SD_SHARE_CPUPOWER - describes SMT topologies >> + * SD_SHARE_PKG_RESOURCES - describes shared caches >> + * SD_NUMA - describes NUMA topologies >> + * >> + * Odd one out: >> + * SD_ASYM_PACKING - describes SMT quirks >> + */ >> +#define TOPOLOGY_SD_FLAGS \ >> + (SD_SHARE_CPUPOWER | \ >> + SD_SHARE_PKG_RESOURCES | \ >> + SD_NUMA | \ >> + SD_ASYM_PACKING) >> >> static struct sched_domain * >> -sd_numa_init(struct sched_domain_topology_level *tl, int cpu) >> +sd_init(struct sched_domain_topology_level *tl, int cpu) >> { >> struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu); >> - int level = tl->numa_level; >> - int sd_weight = cpumask_weight( >> - >> sched_domains_numa_masks[level][cpu_to_node(cpu)]); >> + int sd_weight; >> + > > > Next line could be guared by #ifdef CONFIG_NUMA. We still use #ifdef > CONFIG_NUMA later in sd_init() though. > >> + /* >> + * Ugly hack to pass state to sd_numa_mask()... >> + */ >> + sched_domains_curr_level = tl->numa_level; >> + >> + sd_weight = cpumask_weight(tl->mask(cpu)); >> + >> + if (WARN_ONCE(tl->sd_flags & ~TOPOLOGY_SD_FLAGS, >> + "wrong sd_flags in topology description\n")) >> + tl->sd_flags &= ~TOPOLOGY_SD_FLAGS; >> >> *sd = (struct sched_domain){ >> .min_interval = sd_weight, >> .max_interval = 2*sd_weight, >> .busy_factor = 32, >> .imbalance_pct = 125, >> - .cache_nice_tries = 2, >> - .busy_idx = 3, >> - .idle_idx = 2, >> + >> + .cache_nice_tries = 0, >> + .busy_idx = 0, >> + .idle_idx = 0, >> .newidle_idx = 0, >> .wake_idx = 0, >> .forkexec_idx = 0, > > > Why we want to explicitly set those indexes to 0 here? IMHO, the memory for > *sd is zeroed out before. This is true for all data members which are set to > 0 later in this function including the | 0*SD_FOO . IMHO, would make the > code more readable. I would say that it makes the configuration more readable and modifiable because you have the list of possible flag to set > > >> >> .flags = 1*SD_LOAD_BALANCE >> | 1*SD_BALANCE_NEWIDLE >> - | 0*SD_BALANCE_EXEC >> - | 0*SD_BALANCE_FORK >> + | 1*SD_BALANCE_EXEC >> + | 1*SD_BALANCE_FORK >> | 0*SD_BALANCE_WAKE >> - | 0*SD_WAKE_AFFINE >> + | 1*SD_WAKE_AFFINE >> | 0*SD_SHARE_CPUPOWER >> | 0*SD_SHARE_PKG_RESOURCES >> - | 1*SD_SERIALIZE >> + | 0*SD_SERIALIZE >> | 0*SD_PREFER_SIBLING >> - | 1*SD_NUMA >> - | sd_local_flags(level) >> + | 0*SD_NUMA >> + | tl->sd_flags >> , >> + >> .last_balance = jiffies, >> .balance_interval = sd_weight, >> + .smt_gain = 0, >> + .max_newidle_lb_cost = 0, >> + .next_decay_max_lb_cost = jiffies, >> +#ifdef CONFIG_SCHED_DEBUG >> + .name = tl->name, >> +#endif >> }; >> - SD_INIT_NAME(sd, NUMA); >> - sd->private = &tl->data; >> [snip] >> >> +/* >> + * Topology list, bottom-up. >> + */ >> +static struct sched_domain_topology_level default_topology[] = { >> +#ifdef CONFIG_SCHED_SMT >> + { cpu_smt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES, >> SD_INIT_NAME(SMT) }, >> +#endif >> +#ifdef CONFIG_SCHED_MC >> + { cpu_coregroup_mask, SD_SHARE_PKG_RESOURCES, SD_INIT_NAME(MC) }, >> +#endif >> +#ifdef CONFIG_SCHED_BOOK >> + { cpu_book_mask, SD_INIT_NAME(BOOK) }, >> +#endif > > > Never got the new name DIE for CPU? Might confuse people when they use > /proc/sys/kernel/sched_domain/cpuX/domainY/name or sched_domain_debug_one(). In fact, CPU is also confusing because it's used for different things. But if it makes things even more confusing, i can come back to CPU > > >> + { cpu_cpu_mask, SD_INIT_NAME(DIE) }, >> + { NULL, }, >> +}; >> + >> +struct sched_domain_topology_level *sched_domain_topology = >> default_topology; >> + -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists