linux-kernel - Re: [PATCH 0/12] cleanup __build_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090818131558.GO29515@alberich.amd.com>
Date:	Tue, 18 Aug 2009 15:15:58 +0200
From:	Andreas Herrmann <andreas.herrmann3@....com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/12] cleanup __build_sched_domains()

On Tue, Aug 18, 2009 at 01:16:44PM +0200, Ingo Molnar wrote:
> 
> * Andreas Herrmann <andreas.herrmann3@....com> wrote:
> 
> > Hi,
> > 
> > Following patches try to make __build_sched_domains() less ugly 
> > and more readable. They shouldn't be harmful. Thus I think they 
> > can be applied for .32.
> > 
> > Patches are against tip/master as of today.
> > 
> > FYI, I need those patches as a base for introducing a new domain 
> > level for multi-node CPUs for which I intend to sent patches as 
> > RFC asap.
> 
> Very nice cleanups!
> 
> Magny-Cours indeed will need one more sched-domains level, 
> something like:
> 
>    [smt thread]
>    core
>    internal numa node
>    cpu socket
>    external numa node 

My current approach is to have the numa node domain either below CPU
(in case of multi-cpu node where SRAT describes each internal node as
a NUMA node) or as is, as the top-level domain (e.g. in case of node
interleaving or missing/broken ACPI SRAT detection).

Sched domain levels (note SMT==SIBLING, NODE==NUMA) are: 

(1) groups in NUMA domain are subsets of groups in CPU domain
(2) groups in NUMA domain are supersets groups in CPU domain

(1)         | (2)
------------|-------------------
SMT         | SMT
MC          | MC
MN (new)    | MN
NUMA        | CPU
CPU         | NUMA

I'll also introduce a new parameter sched_mn_power_savings which will
cause that tasks are scheduled on one socket until its capacity is
reached. If capacity is reached other sockets can also be occupied.

> ... which is certainly interesting, especially since the hierarchy 
> possibly 'crosses', i.e. we might have the two internal numa nodes 
> share a L2 or L3 cache, right?

> I'd also not be surprised if the load-balancer needed some care to 
> properly handle such a setup.

It needs some care and gave me some headache to get it working in all
cases (i.e. NUMA, no-NUMA, NUMA-but-no-SRAT). My current code (that
still needs to be split in proper patches for submission) works fine
in all but one case. And I am still debugging it.

The case that is not working is a normal (non-multi-node) NUMA system
on which switching to power policy does not take effect for already
running tasks. Just the new created ones are scheduled according to
the power policy.

> It's all welcome work in any case, and for .32.


Thanks,

Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/