[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180723111622.GG30345@linux.vnet.ibm.com>
Date: Mon, 23 Jul 2018 04:16:22 -0700
From: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Rik van Riel <riel@...riel.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 11/19] sched/numa: Restrict migrating in parallel to
the same node.
* Peter Zijlstra <peterz@...radead.org> [2018-07-23 12:38:30]:
> On Wed, Jun 20, 2018 at 10:32:52PM +0530, Srikar Dronamraju wrote:
> > Since task migration under numa balancing can happen in parallel, more
> > than one task might choose to move to the same node at the same time.
> > This can cause load imbalances at the node level.
> >
> > The problem is more likely if there are more cores per node or more
> > nodes in system.
> >
> > Use a per-node variable to indicate if task migration
> > to the node under numa balance is currently active.
> > This per-node variable will not track swapping of tasks.
>
>
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 50c7727..87fb20e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1478,11 +1478,22 @@ struct task_numa_env {
> > static void task_numa_assign(struct task_numa_env *env,
> > struct task_struct *p, long imp)
> > {
> > + pg_data_t *pgdat = NODE_DATA(cpu_to_node(env->dst_cpu));
> > struct rq *rq = cpu_rq(env->dst_cpu);
> >
> > if (xchg(&rq->numa_migrate_on, 1))
> > return;
> >
> > + if (!env->best_task && env->best_cpu != -1)
> > + WRITE_ONCE(pgdat->active_node_migrate, 0);
> > +
> > + if (!p) {
> > + if (xchg(&pgdat->active_node_migrate, 1)) {
> > + WRITE_ONCE(rq->numa_migrate_on, 0);
> > + return;
> > + }
> > + }
> > +
> > if (env->best_cpu != -1) {
> > rq = cpu_rq(env->best_cpu);
> > WRITE_ONCE(rq->numa_migrate_on, 0);
>
>
> Urgh, that's prertty magical code. And it doesn't even have a comment.
>
> For isntance, I cannot tell why we clear that active_node_migrate thing
> right there.
>
active_node_migrate doesn't track swaps, it only tracks task movement to
a node. Here a task finds a first cpu which is idle. So it would have
set pgdat->active_node_migrate. Here env->best_task is NULL but
env->best_cpu is set.
Next the task might find another cpu where it finds swap to be
beneficial than a move. i.e there is a pair of tasks to be swapped. Now
we have to reset pgdat->active_node_migrate. The test for best_task and
best_cpu will tell us if we had set active_node_migrate.
--
Thanks and Regards
Srikar Dronamraju
Powered by blists - more mailing lists