lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150715214802.GL15934@mtj.duckdns.org>
Date:	Wed, 15 Jul 2015 17:48:02 -0400
From:	Tejun Heo <tj@...nel.org>
To:	Tang Chen <tangchen@...fujitsu.com>
Cc:	mingo@...hat.com, akpm@...ux-foundation.org, rjw@...ysocki.net,
	hpa@...or.com, laijs@...fujitsu.com, yasu.isimatu@...il.com,
	isimatu.yasuaki@...fujitsu.com, kamezawa.hiroyu@...fujitsu.com,
	izumi.taku@...fujitsu.com, gongzhaogang@...pur.com,
	qiaonuohan@...fujitsu.com, x86@...nel.org,
	linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, Gu Zheng <guz.fnst@...fujitsu.com>
Subject: Re: [PATCH 1/5] x86, gfp: Cache best near node for memory allocation.

Hello,

On Tue, Jul 07, 2015 at 05:30:21PM +0800, Tang Chen wrote:
...
> Why doing this is to prevent memory allocation failure if the cpu is

"The reason for doing this ..."

> online but there is no memory on that node.
> 
> But since cpuid <-> nodeid mapping will fix after this patch-set, doing

"But since cpuid <-> nodeid mapping is planned to be made static, ..."

> so in initialization pharse makes no sense any more. The best near online
> node for each cpu should be cached somewhere.

I'm not really following.  Is this because the now offline node can
later come online and we'd have to break the constant mapping
invariant if we update the mapping later?  If so, it'd be nice to
spell that out.

>  void numa_set_node(int cpu, int node)
>  {
>  	int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
> @@ -95,7 +121,11 @@ void numa_set_node(int cpu, int node)
>  		return;
>  	}
>  #endif
> +
> +	per_cpu(x86_cpu_to_near_online_node, cpu) =
> +			find_near_online_node(numa_cpu_node(cpu));
>  	per_cpu(x86_cpu_to_node_map, cpu) = node;
> +	cpumask_set_cpu(cpu, &node_to_cpuid_mask_map[numa_cpu_node(cpu)]);
>  
>  	set_cpu_numa_node(cpu, node);
>  }
> @@ -105,6 +135,13 @@ void numa_clear_node(int cpu)
>  	numa_set_node(cpu, NUMA_NO_NODE);
>  }
>  
> +int get_near_online_node(int node)
> +{
> +	return per_cpu(x86_cpu_to_near_online_node,
> +		       cpumask_first(&node_to_cpuid_mask_map[node]));
> +}
> +EXPORT_SYMBOL(get_near_online_node);

Umm... this function is sitting on a fairly hot path and scanning a
cpumask each time.  Why not just build a numa node -> numa node array?

> @@ -702,24 +739,6 @@ void __init x86_numa_init(void)
>  	numa_init(dummy_numa_init);
>  }
>  
> -static __init int find_near_online_node(int node)
> -{
> -	int n, val;
> -	int min_val = INT_MAX;
> -	int best_node = -1;
> -
> -	for_each_online_node(n) {
> -		val = node_distance(node, n);
> -
> -		if (val < min_val) {
> -			min_val = val;
> -			best_node = n;
> -		}
> -	}
> -
> -	return best_node;
> -}

It's usually better to not mix code movement with actual changes.

> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 6ba7cf2..4a18b21 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -307,13 +307,23 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
>  	if (nid < 0)
>  		nid = numa_node_id();
>  
> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
> +	if (!node_online(nid))
> +		nid = get_near_online_node(nid);
> +#endif

Can you please introduce a wrapper function to do the above so that we
don't open code ifdefs?

> +
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }
>  
>  static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
>  						unsigned int order)
>  {
> -	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> +	VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
> +
> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
> +	if (!node_online(nid))
> +		nid = get_near_online_node(nid);
> +#endif
>  
>  	return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
>  }

Ditto.  Also, what's the synchronization rules for NUMA node
on/offlining.  If you end up updating the mapping later, how would
that be synchronized against the above usages?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ