[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5000347E.1050301@hp.com>
Date: Fri, 13 Jul 2012 07:45:18 -0700
From: Don Morris <don.morris@...com>
To: Rik van Riel <riel@...hat.com>
CC: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
Suresh Siddha <suresh.b.siddha@...el.com>,
Mike Galbraith <efault@....de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Dan Smith <danms@...ibm.com>,
Bharata B Rao <bharata.rao@...il.com>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Andrea Arcangeli <aarcange@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC][PATCH 14/26] sched, numa: Numa balancer
On 07/12/2012 03:02 PM, Rik van Riel wrote:
> On 03/16/2012 10:40 AM, Peter Zijlstra wrote:
>
> At LSF/MM, there was a presentation comparing Peter's
> NUMA code with Andrea's NUMA code. I believe this is
> the main reason why Andrea's code performed better in
> that particular test...
>
>> + if (sched_feat(NUMA_BALANCE_FILTER)) {
>> + /*
>> + * Avoid moving ne's when we create a larger imbalance
>> + * on the other end.
>> + */
>> + if ((imb->type & NUMA_BALANCE_CPU) &&
>> + imb->cpu - cpu_moved < ne_cpu / 2)
>> + goto next;
>> +
>> + /*
>> + * Avoid migrating ne's when we'll know we'll push our
>> + * node over the memory limit.
>> + */
>> + if (max_mem_load &&
>> + imb->mem_load + mem_moved + ne_mem > max_mem_load)
>> + goto next;
>> + }
>
> IIRC the test consisted of a 16GB NUMA system with two 8GB nodes.
> It was running 3 KVM guests, two guests of 3GB memory each, and
> one guest of 6GB each.
How many cpus per guest (host threads) and how many physical/logical
cpus per node on the host? Any comparisons with a situation where
the memory would fit within nodes but the scheduling load would
be too high?
Don
>
> With autonuma, the 6GB guest ended up on one node, and the
> 3GB guests on the other.
>
> With sched numa, each node had a 3GB guest, and part of the 6GB guest.
>
> There is a fundamental difference in the balancing between autonuma
> and sched numa.
>
> In sched numa, a process is moved over to the current node only if
> the current node has space for it.
>
> Autonuma, on the other hand, operates more of a a "hostage exchange"
> policy, where a thread on one node is exchanged with a thread on
> another node, if it looks like that will reduce the overall number
> of cross-node NUMA faults in the system.
>
> I am not sure how to do a "hostage exchange" algorithm with
> sched numa, but it would seem like it could be necessary in order
> for some workloads to converge on a sane configuration.
>
> After all, with only about 2GB free on each node, you will never
> get to move either a 3GB guest, or parts of a 6GB guest...
>
> Any ideas?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
> .
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists