linux-kernel - Re: [PATCH v2 11/19] sched/numa: Restrict migrating in parallel to the same node.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20180723111622.GG30345@linux.vnet.ibm.com>
Date:   Mon, 23 Jul 2018 04:16:22 -0700
From:   Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Rik van Riel <riel@...riel.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 11/19] sched/numa: Restrict migrating in parallel to
 the same node.

* Peter Zijlstra <peterz@...radead.org> [2018-07-23 12:38:30]:

> On Wed, Jun 20, 2018 at 10:32:52PM +0530, Srikar Dronamraju wrote:
> > Since task migration under numa balancing can happen in parallel, more
> > than one task might choose to move to the same node at the same time.
> > This can cause load imbalances at the node level.
> > 
> > The problem is more likely if there are more cores per node or more
> > nodes in system.
> > 
> > Use a per-node variable to indicate if task migration
> > to the node under numa balance is currently active.
> > This per-node variable will not track swapping of tasks.
> 
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 50c7727..87fb20e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1478,11 +1478,22 @@ struct task_numa_env {
> >  static void task_numa_assign(struct task_numa_env *env,
> >  			     struct task_struct *p, long imp)
> >  {
> > +	pg_data_t *pgdat = NODE_DATA(cpu_to_node(env->dst_cpu));
> >  	struct rq *rq = cpu_rq(env->dst_cpu);
> >  
> >  	if (xchg(&rq->numa_migrate_on, 1))
> >  		return;
> >  
> > +	if (!env->best_task && env->best_cpu != -1)
> > +		WRITE_ONCE(pgdat->active_node_migrate, 0);
> > +
> > +	if (!p) {
> > +		if (xchg(&pgdat->active_node_migrate, 1)) {
> > +			WRITE_ONCE(rq->numa_migrate_on, 0);
> > +			return;
> > +		}
> > +	}
> > +
> >  	if (env->best_cpu != -1) {
> >  		rq = cpu_rq(env->best_cpu);
> >  		WRITE_ONCE(rq->numa_migrate_on, 0);
> 
> 
> Urgh, that's prertty magical code. And it doesn't even have a comment.
> 
> For isntance, I cannot tell why we clear that active_node_migrate thing
> right there.
> 

active_node_migrate doesn't track swaps, it only tracks task movement to
a node. Here a task finds a first cpu which is idle.  So it would have
set pgdat->active_node_migrate. Here env->best_task is NULL but
env->best_cpu is set.

Next the task might find another cpu where it finds swap to be
beneficial than a move. i.e there is a pair of tasks to be swapped. Now
we have to reset pgdat->active_node_migrate. The test for best_task and
best_cpu will tell us if we had set active_node_migrate.

-- 
Thanks and Regards
Srikar Dronamraju