[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5593AAF4.2000405@huawei.com>
Date: Wed, 1 Jul 2015 16:55:16 +0800
From: Xishi Qiu <qiuxishi@...wei.com>
To: Tang Chen <tangchen@...fujitsu.com>
CC: <tglx@...utronix.de>, <mingo@...hat.com>, <hpa@...or.com>,
<akpm@...ux-foundation.org>, <tj@...nel.org>, <dyoung@...hat.com>,
<isimatu.yasuaki@...fujitsu.com>, <yasu.isimatu@...il.com>,
<lcapitulino@...hat.com>, <will.deacon@....com>,
<tony.luck@...el.com>, <vladimir.murzin@....com>, <fabf@...net.be>,
<kuleshovmail@...il.com>, <bhe@...hat.com>, <x86@...nel.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
Subject: Re: [PATCH 1/1] mem-hotplug: Handle node hole when initializing numa_meminfo.
On 2015/7/1 15:55, Tang Chen wrote:
>
> On 07/01/2015 02:25 PM, Xishi Qiu wrote:
>> On 2015/7/1 11:16, Tang Chen wrote:
>>
>>> When parsing SRAT, all memory ranges are added into numa_meminfo.
>>> In numa_init(), before entering numa_cleanup_meminfo(), all possible
>>> memory ranges are in numa_meminfo. And numa_cleanup_meminfo() removes
>>> all ranges over max_pfn or empty.
>>>
>>> But, this only works if the nodes are continuous. Let's have a look
>>> at the following example:
>>>
>>> We have an SRAT like this:
>>> SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff]
>>> SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff]
>>> SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff]
>>> SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug
>>> SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug
>>> SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug
>>> SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug
>>> SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug
>>> SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug
>>>
>>> On boot, only node 0,1,2,3 exist.
>>>
>>> And the numa_meminfo will look like this:
>>> numa_meminfo.nr_blks = 9
>>> 1. on node 0: [0, 60000000]
>>> 2. on node 0: [100000000, 20000000000]
>>> 3. on node 1: [20000000000, 40000000000]
>>> 4. on node 4: [40000000000, 60000000000]
>>> 5. on node 5: [60000000000, 80000000000]
>>> 6. on node 2: [80000000000, a0000000000]
>>> 7. on node 3: [a0000000000, a0800000000]
>>> 8. on node 6: [c0000000000, a0800000000]
>>> 9. on node 7: [e0000000000, a0800000000]
>>>
>>> And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because
>>> the end address is over max_pfn, which is a0800000000. But 4 and 5
>>> are not removed because their end addresses are less then max_pfn.
>>> But in fact, node 4 and 5 don't exist.
>>>
>>> In a word, numa_cleanup_meminfo() is not able to handle holes between nodes.
>>>
>>> Since memory ranges in node 4 and 5 are in numa_meminfo, in numa_register_memblks(),
>>> node 4 and 5 will be mistakenly set to online.
>>>
>>> In this patch, we use memblock_overlaps_region() to check if ranges in
>>> numa_meminfo overlap with ranges in memory_block. Since memory_block contains
>>> all available memory at boot time, if they overlap, it means the ranges
>>> exist. If not, then remove them from numa_meminfo.
>>>
>> Hi Tang Chen,
>>
>> What's the impact of this problem?
>>
>> Command "numactl --hard" will show an empty node(no cpu and no memory,
>> but pgdat is created), right?
>
> On my box, if I run lscpu, the output looks like this:
>
> NUMA node0 CPU(s): 0-14,128-142
> NUMA node1 CPU(s): 15-29,143-157
> NUMA node2 CPU(s):
> NUMA node3 CPU(s):
> NUMA node4 CPU(s): 62-76,190-204
> NUMA node5 CPU(s): 78-92,206-220
>
> Node 2 and 3 are not exist, but they are online.
>
Yes, because srat->numa_meminfo->alloc pgdat.
Thanks,
Xishi Qiu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists