linux-kernel - Re: [RFC][PATCH 14/26] sched, numa: Numa balancer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4FFF4987.4050205@redhat.com>
Date:	Thu, 12 Jul 2012 18:02:47 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Mike Galbraith <efault@....de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Dan Smith <danms@...ibm.com>,
	Bharata B Rao <bharata.rao@...il.com>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC][PATCH 14/26] sched, numa: Numa balancer

On 03/16/2012 10:40 AM, Peter Zijlstra wrote:

At LSF/MM, there was a presentation comparing Peter's
NUMA code with Andrea's NUMA code. I believe this is
the main reason why Andrea's code performed better in
that particular test...

> +		if (sched_feat(NUMA_BALANCE_FILTER)) {
> +			/*
> +			 * Avoid moving ne's when we create a larger imbalance
> +			 * on the other end.
> +			 */
> +			if ((imb->type & NUMA_BALANCE_CPU) &&
> +			    imb->cpu - cpu_moved < ne_cpu / 2)
> +				goto next;
> +
> +			/*
> +			 * Avoid migrating ne's when we'll know we'll push our
> +			 * node over the memory limit.
> +			 */
> +			if (max_mem_load &&
> +			    imb->mem_load + mem_moved + ne_mem > max_mem_load)
> +				goto next;
> +		}

IIRC the test consisted of a 16GB NUMA system with two 8GB nodes.
It was running 3 KVM guests, two guests of 3GB memory each, and
one guest of 6GB each.

With autonuma, the 6GB guest ended up on one node, and the
3GB guests on the other.

With sched numa, each node had a 3GB guest, and part of the 6GB guest.

There is a fundamental difference in the balancing between autonuma
and sched numa.

In sched numa, a process is moved over to the current node only if
the current node has space for it.

Autonuma, on the other hand, operates more of a a "hostage exchange"
policy, where a thread on one node is exchanged with a thread on
another node, if it looks like that will reduce the overall number
of cross-node NUMA faults in the system.

I am not sure how to do a "hostage exchange" algorithm with
sched numa, but it would seem like it could be necessary in order
for some workloads to converge on a sane configuration.

After all, with only about 2GB free on each node, you will never
get to move either a 3GB guest, or parts of a 6GB guest...

Any ideas?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/