linux-kernel - Re: [PATCH RFC 4/5] sched,numa: calculate node scores in complex NUMA topologies

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141012145358.GH3015@worktop>
Date:	Sun, 12 Oct 2014 16:53:58 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	riel@...hat.com
Cc:	linux-kernel@...r.kernel.org, mgorman@...e.de, chegu_vinod@...com,
	mingo@...nel.org, efault@....de, vincent.guittot@...aro.org
Subject: Re: [PATCH RFC 4/5] sched,numa: calculate node scores in complex
 NUMA topologies

On Wed, Oct 08, 2014 at 03:37:29PM -0400, riel@...hat.com wrote:
> From: Rik van Riel <riel@...hat.com>
> 
> In order to do task placement on systems with complex NUMA topologies,
> it is necessary to count the faults on nodes nearby the node that is
> being examined for a potential move.
> 
> In case of a system with a backplane interconnect, we are dealing with
> groups of NUMA nodes; each of the nodes within a group is the same number
> of hops away from nodes in other groups in the system. Optimal placement
> on this topology is achieved by counting all nearby nodes equally. When
> comparing nodes A and B at distance N, nearby nodes are those at distances
> smaller than N from nodes A or B.
> 
> Placement strategy on a system with a glueless mesh NUMA topology needs
> to be different, because there are no natural groups of nodes determined
> by the hardware. Instead, when dealing with two nodes A and B at distance
> N, N >= 2, there will be intermediate nodes at distance < N from both nodes
> A and B. Good placement can be achieved by right shifting the faults on
> nearby nodes by the number of hops from the node being scored. In this
> context, a nearby node is any node less than the maximum distance in the
> system away from the node. Those nodes are skipped for efficiency reasons,
> there is no real policy reason to do so.


> +/* Handle placement on systems where not all nodes are directly connected. */
> +static unsigned long score_nearby_nodes(struct task_struct *p, int nid,
> +					int hoplimit, bool task)
> +{
> +	unsigned long score = 0;
> +	int node;
> +
> +	/*
> +	 * All nodes are directly connected, and the same distance
> +	 * from each other. No need for fancy placement algorithms.
> +	 */
> +	if (sched_numa_topology_type == NUMA_DIRECT)
> +		return 0;
> +
> +	for_each_online_node(node) {

> +	}
> +
> +	return score;
> +}

> @@ -944,6 +1003,8 @@ static inline unsigned long task_weight(struct task_struct *p, int nid,
>  		return 0;
>  
>  	faults = task_faults(p, nid);
> +	faults += score_nearby_nodes(p, nid, hops, true);
> +
>  	return 1000 * faults / total_faults;
>  }

> @@ -961,6 +1022,8 @@ static inline unsigned long group_weight(struct task_struct *p, int nid,
>  		return 0;
>  
>  	faults = group_faults(p, nid);
> +	faults += score_nearby_nodes(p, nid, hops, false);
> +
>  	return 1000 * faults / total_faults;
>  }

So this makes {task,group}_weight() O(nr_nodes), and we call these
function from O(nr_nodes) loops, giving a total of O(nr_nodes^2)
computational complexity, right?

Seems important to mention; I realize this is only for !DIRECT, but
still, I bet the real large people (those same 512 nodes we had
previous) would not really appreciate this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/