linux-kernel - Re: [RFC] AutoNUMA alpha6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 22 Mar 2012 19:49:25 +0100
From:	Andrea Arcangeli <aarcange@...hat.com>
To:	Dan Smith <danms@...ibm.com>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Mike Galbraith <efault@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Bharata B Rao <bharata.rao@...il.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC] AutoNUMA alpha6

Hi Dan,

On Thu, Mar 22, 2012 at 03:27:35PM +0100, Andrea Arcangeli wrote:
> current code would optimally perform, if all nodes are busy and there
> aren't idle cores (or only idle siblings). I guess I'll leave the HT
> optimizations for later. I probably shall measure this again with HT off.

I added the latest virt measurement with KVM for kernel build and
memhog. I also measured how much I'd save by increasing the
knuma_scand pass frequency (scan_sleep_pass_millisecs) from 10sec
default (5000 value) to 30sec. I also tried 1min but it was within
error range of 30sec. 10sec -> 30sec is also almost within error range
showing the cost is really tiny. Luckily the numbers were totally
stable by running a -j16 loop on both VM (each VM had 12 vcpus on a
host with 24 CPUs) and the error was less than 1sec for each kernel
build (on tmpfs obviously and totally stripped down userland in both
guest and host).

http://www.kernel.org/pub/linux/kernel/people/andrea/autonuma/autonuma_bench-20120322.pdf

slide 11 and 12.

This is with THP on, with THP off things would be different likely but
hey THP off is like 20% slower or more on a kernel build in guest in
the first place.

I'm satisfied with the benchmarks results so far and more will come
soon, but now it's time to go back coding and add THP native
migration. That will benefit everyone, from cpuset in userland to
numa/sched.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/