linux-kernel - Re: newidle balancing in NUMA domain?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20091123114339.GB2287@wotan.suse.de>
Date:	Mon, 23 Nov 2009 12:43:39 +0100
From:	Nick Piggin <npiggin@...e.de>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: newidle balancing in NUMA domain?

On Mon, Nov 23, 2009 at 12:36:15PM +0100, Peter Zijlstra wrote:
> On Mon, 2009-11-23 at 12:22 +0100, Nick Piggin wrote:
> > Hi,
> > 
> > I wonder why it was decided to do newidle balancing in the NUMA
> > domain? And with newidle_idx == 0 at that.
> > 
> > This means that every time the CPU goes idle, every CPU in the
> > system gets a remote cacheline or two hit. Not very nice O(n^2)
> > behaviour on the interconnect. Not to mention trashing our
> > NUMA locality.
> > 
> > And then I see some proposal to do ratelimiting of newidle
> > balancing :( Seems like hack upon hack making behaviour much more
> > complex.
> > 
> > One "symptom" of bad mutex contention can be that increasing the
> > balancing rate can help a bit to reduce idle time (because it
> > can get the woken thread which is holding a semaphore to run ASAP
> > after we run out of runnable tasks in the system due to them 
> > hitting contention on that semaphore).
> > 
> > I really hope this change wasn't done in order to help -rt or
> > something sad like sysbench on MySQL.
> 
> IIRC this was kbuild and other spreading workloads that want this.
> 
> the newidle_idx=0 thing is because I frequently saw it make funny
> balance decisions based on old load numbers, like f_b_g() selecting a
> group that didn't even have tasks in anymore.

Well it is just a damping factor on runqueue flucturations. If the
group recently had load then the point of the idx is to account
for this. On the other hand, if we have other groups that are also
above the idx damped average, it would make sense to use them
instead. (ie. cull source groups with no pullable tasks).


> We went without newidle for a while, but then people started complaining
> about that kbuild time, and there is a x264 encoder thing that looses
> tons of throughput.

So... these were due to what? Other changes in domains balancing?
Changes in CFS? Something else? Or were they comparisons versus
other operating systems?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/