linux-kernel - Re: [PATCH 22/35] autonuma: sched_set_autonuma_need

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120529182440.GN21339@redhat.com>
Date:	Tue, 29 May 2012 20:24:40 +0200
From:	Andrea Arcangeli <aarcange@...hat.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Hillf Danton <dhillf@...il.com>, Dan Smith <danms@...ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Mike Galbraith <efault@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Bharata B Rao <bharata.rao@...il.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
	Christoph Lameter <cl@...ux.com>
Subject: Re: [PATCH 22/35] autonuma: sched_set_autonuma_need_balance

On Tue, May 29, 2012 at 07:43:27PM +0200, Peter Zijlstra wrote:
> On Tue, 2012-05-29 at 19:33 +0200, Andrea Arcangeli wrote:
> > So the cost on a 24-way SMP 
> 
> is irrelevant.. also, not every cpu gets to the 24 cpu domain, just 2
> do.
> 
> When you do for_each_cpu() think at least 4096, if you do
> for_each_node() think at least 256.
> 
> Add to that the knowledge that doing 4096 remote memory accesses will
> cost multiple jiffies, then realize you're wanting to do that with
> preemption disabled.
> 
> That's just a very big no go.

I'm thinking 4096/256, this is why I mentioned it's a 24-way system. I
think the hackbench should be repeated on a much bigger system to see
what happens, I'm not saying it'll work fine already.

But from autonuma13 to 14 it's a world of difference in hackbench
terms, to the point the cost is zero on a 24-way.

My idea down the road, with multi hop systems, is to balance across
the 1 hop at the regular load_balance interval, and move to the 2 hops
at half frequency, and 3 hops at 1/4th frequency etc... That change
alone should help tremendously with 256 nodes and 5/6 hops. And it
should be quite easy to implement too.

knuma_migrated also need to learn more about the hops and probably
scan at higher frequency the lru heads coming from the closer hops.

The code is not "hops" aware yet and certainly there are still lots of
optimization to do for the very big systems. I think it's already
quite ideal right now for most servers and I don't see blockers in
optimizing it for the extreme big cases (and I expect it'd already
work better than nothing in the extreme setups). I removed [RFC]
because I'm quite happy with it now (there were things I wasn't happy
with before), but I didn't mean it's finished.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/