[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 07 Jul 2012 14:26:39 -0400
From: Rik van Riel <riel@...hat.com>
To: Peter Zijlstra <a.p.zijlstra@...llo.nl>
CC: Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
Suresh Siddha <suresh.b.siddha@...el.com>,
Mike Galbraith <efault@....de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Dan Smith <danms@...ibm.com>,
Bharata B Rao <bharata.rao@...il.com>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Andrea Arcangeli <aarcange@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC][PATCH 14/26] sched, numa: Numa balancer
On 03/16/2012 10:40 AM, Peter Zijlstra wrote:
> +/*
> + * Assumes symmetric NUMA -- that is, each node is of equal size.
> + */
> +static void set_max_mem_load(unsigned long load)
> +{
> + unsigned long old_load;
> +
> + spin_lock(&max_mem_load.lock);
> + old_load = max_mem_load.load;
> + if (!old_load)
> + old_load = load;
> + max_mem_load.load = (old_load + load) >> 1;
> + spin_unlock(&max_mem_load.lock);
> +}
The above in your patch kind of conflicts with this bit
from patch 6/26:
+ /*
+ * Migration allocates pages in the highest zone. If we cannot
+ * do so then migration (at least from node to node) is not
+ * possible.
+ */
+ if (vma->vm_file &&
+ gfp_zone(mapping_gfp_mask(vma->vm_file->f_mapping))
+ < policy_zone)
+ return 0;
Looking at how the memory load code is used, I wonder
if it would make sense to count "zone size - free - inactive
file" pages instead?
> + /*
> + * Avoid migrating ne's when we'll know we'll push our
> + * node over the memory limit.
> + */
> + if (max_mem_load &&
> + imb->mem_load + mem_moved + ne_mem > max_mem_load)
> + goto next;
> +static void numa_balance(struct node_queue *this_nq)
> +{
> + struct numa_imbalance imb;
> + int busiest;
> +
> + busiest = find_busiest_node(this_nq->node, &imb);
> + if (busiest == -1)
> + return;
> +
> + if (imb.cpu <= 0 && imb.mem <= 0)
> + return;
> +
> + move_processes(nq_of(busiest), this_nq, &imb);
> +}
You asked how and why Andrea's algorithm converges.
After looking at both patch sets for a while, and asking
for clarification, I think I can see how his code converges.
It is not yet clear to me how and why your code converges.
I see some dual bin packing (CPU & memory) heuristics, but
it is not at all clear to me how they interact, especially
when workloads are going active and idle on a regular basis.
--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists