linux-kernel - Re: [PATCH 2/4] sched,numa: weigh nearby nodes for task placement on complex NUMA topologies

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140509101152.GT30445@twins.programming.kicks-ass.net>
Date:	Fri, 9 May 2014 12:11:52 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	riel@...hat.com
Cc:	linux-kernel@...r.kernel.org, mingo@...nel.org, mgorman@...e.de,
	chegu_vinod@...com
Subject: Re: [PATCH 2/4] sched,numa: weigh nearby nodes for task placement on
 complex NUMA topologies

On Thu, May 08, 2014 at 01:23:29PM -0400, riel@...hat.com wrote:
> @@ -930,7 +987,7 @@ static inline unsigned long group_faults_cpu(struct numa_group *group, int nid)
>   */
>  static inline unsigned long task_weight(struct task_struct *p, int nid)
>  {
> -	unsigned long total_faults;
> +	unsigned long total_faults, score;
>  
>  	if (!p->numa_faults_memory)
>  		return 0;
> @@ -940,15 +997,32 @@ static inline unsigned long task_weight(struct task_struct *p, int nid)
>  	if (!total_faults)
>  		return 0;
>  
> -	return 1000 * task_faults(p, nid) / total_faults;
> +	score = 1000 * task_faults(p, nid);
> +	score += nearby_nodes_score(p, nid, true);
> +
> +	score /= total_faults;
> +
> +	return score;
>  }
>  
>  static inline unsigned long group_weight(struct task_struct *p, int nid)
>  {
> -	if (!p->numa_group || !p->numa_group->total_faults)
> +	unsigned long total_faults, score;
> +
> +	if (!p->numa_group)
> +		return 0;
> +
> +	total_faults = p->numa_group->total_faults;
> +
> +	if (!total_faults)
>  		return 0;
>  
> -	return 1000 * group_faults(p, nid) / p->numa_group->total_faults;
> +	score = 1000 * group_faults(p, nid);
> +	score += nearby_nodes_score(p, nid, false);
> +
> +	score /= total_faults;
> +
> +	return score;
>  }

OK, and that's just sad..

See task_numa_placement(), which does:

	for_each_online_node(nid) {
		weight = task_weight(p, nid) + group_weight(p, nid);
		if (weight > max_weight) {
			max_weight = weight;
			max_nid = nid;
		}
	}

So not only is that loop now O(nr_nodes^2), the inner loops doubly
iterates all nodes.

Also, {task,group}_weight() functions were like cheap-ish (/me mumbles
something about people using !2^n scaling factors for no sane reason).
And they're used all over with that in mind.

But look what you did to migrate_improves_locality(), that will now
iterate all nodes _4_ times, and its called for every single task we try
and migrate during load balance, while holding rq->lock.



Content of type "application/pgp-signature" skipped