[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55C6EFFF.5070605@cn.fujitsu.com>
Date: Sun, 9 Aug 2015 14:15:27 +0800
From: Tang Chen <tangchen@...fujitsu.com>
To: Jiang Liu <jiang.liu@...ux.intel.com>, Tejun Heo <tj@...nel.org>
CC: <mingo@...hat.com>, <akpm@...ux-foundation.org>,
<rjw@...ysocki.net>, <hpa@...or.com>, <laijs@...fujitsu.com>,
<yasu.isimatu@...il.com>, <isimatu.yasuaki@...fujitsu.com>,
<kamezawa.hiroyu@...fujitsu.com>, <izumi.taku@...fujitsu.com>,
<gongzhaogang@...pur.com>, <qiaonuohan@...fujitsu.com>,
<x86@...nel.org>, <linux-acpi@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
Subject: Re: [PATCH 1/5] x86, gfp: Cache best near node for memory allocation.
Hi Liu,
Have you posted your new patches ?
(I mean memory-less node support patches.)
If you are going to post them, please cc me.
And BTW, how did you reproduce the memory-less node problem ?
Do you have a real memory-less node on your machine ?
Thanks. :)
On 08/04/2015 04:05 PM, Jiang Liu wrote:
> On 2015/8/4 11:36, Tang Chen wrote:
>> Hi TJ,
>>
>> Sorry for the late reply.
>>
>> On 07/16/2015 05:48 AM, Tejun Heo wrote:
>>> ......
>>> so in initialization pharse makes no sense any more. The best near online
>>> node for each cpu should be cached somewhere.
>>> I'm not really following. Is this because the now offline node can
>>> later come online and we'd have to break the constant mapping
>>> invariant if we update the mapping later? If so, it'd be nice to
>>> spell that out.
>> Yes. Will document this in the next version.
>>
>>>> ......
>>>> +int get_near_online_node(int node)
>>>> +{
>>>> + return per_cpu(x86_cpu_to_near_online_node,
>>>> + cpumask_first(&node_to_cpuid_mask_map[node]));
>>>> +}
>>>> +EXPORT_SYMBOL(get_near_online_node);
>>> Umm... this function is sitting on a fairly hot path and scanning a
>>> cpumask each time. Why not just build a numa node -> numa node array?
>> Indeed. Will avoid to scan a cpumask.
>>
>>> ......
>>>
>>>> static inline struct page *alloc_pages_exact_node(int nid, gfp_t
>>>> gfp_mask,
>>>> unsigned int order)
>>>> {
>>>> - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
>>>> + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>>>> +
>>>> +#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
>>>> + if (!node_online(nid))
>>>> + nid = get_near_online_node(nid);
>>>> +#endif
>>>> return __alloc_pages(gfp_mask, order, node_zonelist(nid,
>>>> gfp_mask));
>>>> }
>>> Ditto. Also, what's the synchronization rules for NUMA node
>>> on/offlining. If you end up updating the mapping later, how would
>>> that be synchronized against the above usages?
>> I think the near online node map should be updated when node online/offline
>> happens. But about this, I think the current numa code has a little
>> problem.
>>
>> As you know, firmware info binds a set of CPUs and memory to a node. But
>> at boot time, if the node has no memory (a memory-less node) , it won't
>> be online.
>> But the CPUs on that node is available, and bound to the near online node.
>> (Here, I mean numa_set_node(cpu, node).)
>>
>> Why does the kernel do this ? I think it is used to ensure that we can
>> allocate memory
>> successfully by calling functions like alloc_pages_node() and
>> alloc_pages_exact_node().
>> By these two fuctions, any CPU should be bound to a node who has memory
>> so that
>> memory allocation can be successful.
>>
>> That means, for a memory-less node at boot time, CPUs on the node is
>> online,
>> but the node is not online.
>>
>> That also means, "the node is online" equals to "the node has memory".
>> Actually, there
>> are a lot of code in the kernel is using this rule.
>>
>>
>> But,
>> 1) in cpu_up(), it will try to online a node, and it doesn't check if
>> the node has memory.
>> 2) in try_offline_node(), it offlines CPUs first, and then the memory.
>>
>> This behavior looks a little wired, or let's say it is ambiguous. It
>> seems that a NUMA node
>> consists of CPUs and memory. So if the CPUs are online, the node should
>> be online.
> Hi Chen,
> I have posted a patch set to enable memoryless node on x86,
> will repost it for review:) Hope it help to solve this issue.
> Thanks!
> Gerry
>
>> And also,
>> The main purpose of this patch-set is to make the cpuid <-> nodeid
>> mapping persistent.
>> After this patch-set, alloc_pages_node() and alloc_pages_exact_node()
>> won't depend on
>> cpuid <-> nodeid mapping any more. So the node should be online if the
>> CPUs on it are
>> online. Otherwise, we cannot setup interfaces of CPUs under /sys.
>>
>>
>> Unfortunately, since I don't have a machine a with memory-less node, I
>> cannot reproduce
>> the problem right now.
>>
>> How do you think the node online behavior should be changed ?
>>
>> Thanks.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
> .
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists