linux-kernel - Re: newidle balancing in NUMA domain?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1258987059.6193.73.camel@marge.simson.net>
Date:	Mon, 23 Nov 2009 15:37:39 +0100
From:	Mike Galbraith <efault@....de>
To:	Nick Piggin <npiggin@...e.de>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: newidle balancing in NUMA domain?

On Mon, 2009-11-23 at 12:22 +0100, Nick Piggin wrote:
> Hi,
> 
> I wonder why it was decided to do newidle balancing in the NUMA
> domain? And with newidle_idx == 0 at that.
> 
> This means that every time the CPU goes idle, every CPU in the
> system gets a remote cacheline or two hit. Not very nice O(n^2)
> behaviour on the interconnect. Not to mention trashing our
> NUMA locality.

Painful on little boxen too if left unchained.

> And then I see some proposal to do ratelimiting of newidle
> balancing :( Seems like hack upon hack making behaviour much more
> complex.

That's mine, and yeah, it is hackish.  It just keeps newidle at bay for
high speed switchers while keeping it available to kick start CPUs for
fork/exec loads.  Suggestions welcome.  I have a threaded testcase
(x264) where turning the think off costs ~40% throughput.  Take that
same testcase (or ilk) to a big NUMA beast, and performance will very
likely suck just as bad as it does on my little Q6600 box.

Other than that, I'd be most happy to see the thing crawl back in it's
cave and _die_ despite the little gain it provides for a kbuild.  It has
been (is) very annoying.

> One "symptom" of bad mutex contention can be that increasing the
> balancing rate can help a bit to reduce idle time (because it
> can get the woken thread which is holding a semaphore to run ASAP
> after we run out of runnable tasks in the system due to them 
> hitting contention on that semaphore).

Yes, when mysql+oltp starts jamming up, load balancing helps bust up the
logjam somewhat, but that's not at all why newidle was activated..

> I really hope this change wasn't done in order to help -rt or
> something sad like sysbench on MySQL.

Newidle was activated to improve fork/exec CPU utilization.  A nasty
side effect is that it tries to rip other loads to tatters.

> And btw, I'll stay out of mentioning anything about CFS development,
> but it really sucks to be continually making significant changes to
> domains balancing *and* per-runqueue scheduling at the same time :(
> It makes it even difficult to bisect things.

Yeah, balancing got jumbled up with desktop tweakage.  Much fallout this
round, and some things still to be fixed back up.

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/