[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130801022319.4a6a977a@annuminas.surriel.com>
Date: Thu, 1 Aug 2013 02:23:19 -0400
From: Rik van Riel <riel@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Mel Gorman <mgorman@...e.de>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...nel.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: [PATCH,RFC] numa,sched: use group fault statistics in numa
placement
Subject: [PATCH,RFC] numa,sched: use group fault statistics in numa placement
Here is a quick strawman on how the group fault stuff could be used
to help pick the best node for a task. This is likely to be quite
suboptimal and in need of tweaking. My main goal is to get this to
Peter & Mel before it's breakfast time on their side of the Atlantic...
This goes on top of "sched, numa: Use {cpu, pid} to create task groups for shared faults"
Enjoy :)
Signed-off-by: Rik van Riel <riel@...hat.com>
---
kernel/sched/fair.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a06bef..fb2e229 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1135,8 +1135,9 @@ struct numa_group {
static void task_numa_placement(struct task_struct *p)
{
- int seq, nid, max_nid = -1;
- unsigned long max_faults = 0;
+ int seq, nid, max_nid = -1, max_group_nid = -1;
+ unsigned long max_faults = 0, max_group_faults = 0;
+ unsigned long total_faults = 0, total_group_faults = 0;
seq = ACCESS_ONCE(p->mm->numa_scan_seq);
if (p->numa_scan_seq == seq)
@@ -1148,7 +1149,7 @@ static void task_numa_placement(struct task_struct *p)
/* Find the node with the highest number of faults */
for (nid = 0; nid < nr_node_ids; nid++) {
- unsigned long faults = 0;
+ unsigned long faults = 0, group_faults = 0;
int priv, i;
for (priv = 0; priv < 2; priv++) {
@@ -1169,6 +1170,7 @@ static void task_numa_placement(struct task_struct *p)
if (p->numa_group) {
/* safe because we can only change our own group */
atomic_long_add(diff, &p->numa_group->faults[i]);
+ group_faults += atomic_long_read(&p->numa_group->faults[i]);
}
}
@@ -1176,11 +1178,35 @@ static void task_numa_placement(struct task_struct *p)
max_faults = faults;
max_nid = nid;
}
+
+ if (group_faults > max_group_faults) {
+ max_group_faults = group_faults;
+ max_group_nid = nid;
+ }
+
+ total_faults += faults;
+ total_group_faults += group_faults;
}
if (sched_feat(NUMA_INTERLEAVE))
task_numa_mempol(p, max_faults);
+ /*
+ * Should we stay on our own, or move in with the group?
+ * The absolute count of faults may not be useful, but comparing
+ * the fraction of accesses in each top node may give us a hint
+ * where to start looking for a migration target.
+ *
+ * max_group_faults max_faults
+ * ------------------ > ------------
+ * total_group_faults total_faults
+ */
+ if (max_group_nid >= 0 && max_group_nid != max_nid) {
+ if (max_group_faults * total_faults >
+ max_faults * total_group_faults)
+ max_nid = max_group_nid;
+ }
+
/* Preferred node as the node with the most faults */
if (max_faults && max_nid != p->numa_preferred_nid) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists