lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <543B7C03.1080002@redhat.com>
Date:	Mon, 13 Oct 2014 03:15:15 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	linux-kernel@...r.kernel.org, mgorman@...e.de, chegu_vinod@...com,
	mingo@...nel.org, efault@....de, vincent.guittot@...aro.org
Subject: Re: [PATCH RFC 4/5] sched,numa: calculate node scores in complex
 NUMA topologies

On 10/12/2014 10:53 AM, Peter Zijlstra wrote:
> On Wed, Oct 08, 2014 at 03:37:29PM -0400, riel@...hat.com wrote:
>> From: Rik van Riel <riel@...hat.com>
>>
>> In order to do task placement on systems with complex NUMA topologies,
>> it is necessary to count the faults on nodes nearby the node that is
>> being examined for a potential move.
>>
>> In case of a system with a backplane interconnect, we are dealing with
>> groups of NUMA nodes; each of the nodes within a group is the same number
>> of hops away from nodes in other groups in the system. Optimal placement
>> on this topology is achieved by counting all nearby nodes equally. When
>> comparing nodes A and B at distance N, nearby nodes are those at distances
>> smaller than N from nodes A or B.
>>
>> Placement strategy on a system with a glueless mesh NUMA topology needs
>> to be different, because there are no natural groups of nodes determined
>> by the hardware. Instead, when dealing with two nodes A and B at distance
>> N, N >= 2, there will be intermediate nodes at distance < N from both nodes
>> A and B. Good placement can be achieved by right shifting the faults on
>> nearby nodes by the number of hops from the node being scored. In this
>> context, a nearby node is any node less than the maximum distance in the
>> system away from the node. Those nodes are skipped for efficiency reasons,
>> there is no real policy reason to do so.
>
>
>> +/* Handle placement on systems where not all nodes are directly connected. */
>> +static unsigned long score_nearby_nodes(struct task_struct *p, int nid,
>> +					int hoplimit, bool task)
>> +{
>> +	unsigned long score = 0;
>> +	int node;
>> +
>> +	/*
>> +	 * All nodes are directly connected, and the same distance
>> +	 * from each other. No need for fancy placement algorithms.
>> +	 */
>> +	if (sched_numa_topology_type == NUMA_DIRECT)
>> +		return 0;
>> +
>> +	for_each_online_node(node) {
>
>> +	}
>> +
>> +	return score;
>> +}
>
>> @@ -944,6 +1003,8 @@ static inline unsigned long task_weight(struct task_struct *p, int nid,
>>   		return 0;
>>
>>   	faults = task_faults(p, nid);
>> +	faults += score_nearby_nodes(p, nid, hops, true);
>> +
>>   	return 1000 * faults / total_faults;
>>   }
>
>> @@ -961,6 +1022,8 @@ static inline unsigned long group_weight(struct task_struct *p, int nid,
>>   		return 0;
>>
>>   	faults = group_faults(p, nid);
>> +	faults += score_nearby_nodes(p, nid, hops, false);
>> +
>>   	return 1000 * faults / total_faults;
>>   }
>
> So this makes {task,group}_weight() O(nr_nodes), and we call these
> function from O(nr_nodes) loops, giving a total of O(nr_nodes^2)
> computational complexity, right?
>
> Seems important to mention; I realize this is only for !DIRECT, but
> still, I bet the real large people (those same 512 nodes we had
> previous) would not really appreciate this.
>
If desired, we can set up a nodemask for each node that
covers only the nodes that are less than the maximum
distance away from each other.

Even on the very large systems, that is likely to be
less than a dozen nodes.

I am not sure whether to add that complexity now, or
whether that should be considered premature optimization :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ