linux-kernel - Re: [PATCH] sched,numa: do not move past the balance point if unbalanced

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150115104517.GR23965@worktop.programming.kicks-ass.net>
Date:	Thu, 15 Jan 2015 11:45:17 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Rik van Riel <riel@...hat.com>
Cc:	linux-kernel@...r.kernel.org, mgorman@...e.de, mingo@...hat.com
Subject: Re: [PATCH] sched,numa: do not move past the balance point if
 unbalanced

On Mon, Jan 12, 2015 at 04:30:39PM -0500, Rik van Riel wrote:
> There is a subtle interaction between the logic introduced in commit
> e63da03639cc9e6e83b62e7ef8ffdbb92421416a, the way the load balancer

  e63da03639cc ("sched/numa: Allow task switch if load imbalance improves")

I have the below in my .gitconfig to easily create them things:

[core]
        abbrev = 12
[alias]
        one = show -s --pretty='format:%h (\"%s\")'

> counts the load on each NUMA node, and the way NUMA hinting faults are
> done.
> 
> Specifically, the load balancer only counts currently running tasks
> in the load, while NUMA hinting faults may cause tasks to stop, if
> the page is locked by another task.
> 
> This could cause all of the threads of a large single instance workload,
> like SPECjbb2005, to migrate to the same NUMA node. This was possible
> because occasionally they all fault on the same few pages, and only one
> of the threads remains runnable. That thread can move to the process's
> preferred NUMA node without making the imbalance worse, because nothing
> else is running at that time.
> 
> The fix is to check the direction of the net moving of load, and to
> refuse a NUMA move if it would cause the system to move past the point
> of balance.  In an unbalanced state, only moves that bring us closer
> to the balance point are allowed.

Did you also test with whatever load needed the previous thing? Its far
too easy to fix one and break the other in my experience ;-)

>  	orig_src_load = env->src_stats.load;
> -	orig_dst_load = env->dst_stats.load;
>  
> -	if (orig_dst_load < orig_src_load)
> -		swap(orig_dst_load, orig_src_load);
> -
> -	old_imb = orig_dst_load * src_capacity * 100 -
> -		  orig_src_load * dst_capacity * env->imbalance_pct;
> +	/*
> +	 * In a task swap, there will be one load moving from src to dst,
> +	 * and another moving back. This is the net sum of both moves.
> +	 * Allow the move if it brings the system closer to a balanced
> +	 * situation, without crossing over the balance point.
> +	 */

This comment seems to 'forget' about the !swap moves?

> +	moved_load = orig_src_load - src_load;
>  
> -	/* Would this change make things worse? */
> -	return (imb > old_imb);
> +	if (moved_load > 0)
> +		/* Moving src -> dst. Did we overshoot balance? */
> +		return src_load < dst_load;

So here we inhibit movement when the src cpu gained (moved_load > 0) and
src was smaller than dst.

However there is no check the new src value is in fact bigger than dst;
src could have gained and still be smaller. And afaict that's a valid
move, even under the proposed semantics, right?

> +	else
> +		/* Moving dst -> src. Did we overshoot balance? */
> +		return dst_load < src_load;

And vs.

>  }

One should use capacity muck when comparing load between CPUs.

In any case, brain hurt.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/