linux-kernel - RE: + sched-use-tasklet-to-call-balancing.patch added to -mm tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <000201c706ee$a9992e80$a081030a@amr.corp.intel.com>
Date:	Sun, 12 Nov 2006 22:40:51 -0800
From:	"Chen, Kenneth W" <kenneth.w.chen@...el.com>
To:	"'Christoph Lameter'" <clameter@....com>
Cc:	"Ingo Molnar" <mingo@...e.hu>,
	"Siddha, Suresh B" <suresh.b.siddha@...el.com>, <akpm@...l.org>,
	<mm-commits@...r.kernel.org>, <nickpiggin@...oo.com.au>,
	<linux-kernel@...r.kernel.org>
Subject: RE: + sched-use-tasklet-to-call-balancing.patch added to -mm tree

Christoph Lameter wrote on Sunday, November 12, 2006 9:45 PM
> > (2) we should initiate load balance within a domain only from least
> >     loaded group.
> 
> This would mean we would have to determine the least loaded group first.

Well, find_busiest_group() scans every single bloody CPU in the system at
the highest sched_domain level.  In fact, this function is capable to find
busiest group within a domain, it should be capable to determine least
loaded group for free because it already scanned every groups within a domain.


> > Part of all this problem probably stemmed from "load balance" is incapable
> > of performing l-d between arbitrary pair of CPUs, and tightly tied load scan
> > and actual l-d action.  And on top of that l-d is really a pull operation
> > to current running CPU. All these limitations dictate that every CPU somehow
> > has to scan and pull.  It is extremely inefficient on large system.
> 
> Right. However, if we follow this line of thought then we will be 
> redesigning the load balancing logic.

It won't be a bad idea to redesign it ;-)

There are number of other oddity beside what was identified in it's design:

(1) several sched_groups are statically declared and they will reside in
    boot node. I would expect cross node memory access to be expansive.
    Every cpu will access these data structure repeatedly.

    static struct sched_group sched_group_cpus[NR_CPUS];
    static struct sched_group sched_group_core[NR_CPUS];
    static struct sched_group sched_group_phys[NR_CPUS];

(2) load balance staggering. Number of people pointed out that it is overly
    done.

(3) The for_each_domain() loop in rebalance_tick() looks different from
    idle_balance() where it will traverse entire sched domains even if lower
    level domain succeeded in moving some tasks.  I would expect we either
    break out of the for loop like idle_balance(), or somehow update load
    for current CPU so it gets accurate load value when doing l-d in the
    next level. Currently, It is doing neither.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/