[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99f14975-f89f-4484-6ae1-296b242d4bf9@gmail.com>
Date: Tue, 14 Mar 2017 12:05:59 -0400
From: YASUAKI ISHIMATSU <yasu.isimatu@...il.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: Igor Mammedov <imammedo@...hat.com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>,
Greg KH <gregkh@...uxfoundation.org>,
"K. Y. Srinivasan" <kys@...rosoft.com>,
David Rientjes <rientjes@...gle.com>,
Daniel Kiper <daniel.kiper@...cle.com>,
linux-api@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
linux-s390@...r.kernel.org, xen-devel@...ts.xenproject.org,
linux-acpi@...r.kernel.org, qiuxishi@...wei.com,
toshi.kani@....com, xieyisheng1@...wei.com, slaoub@...il.com,
iamjoonsoo.kim@....com, vbabka@...e.cz,
Reza Arbab <arbab@...ux.vnet.ibm.com>, yasu.isimatu@...il.com
Subject: Re: WTH is going on with memory hotplug sysf interface
On 03/13/2017 05:19 AM, Michal Hocko wrote:
> On Fri 10-03-17 12:39:27, Yasuaki Ishimatsu wrote:
>> On 03/10/2017 08:58 AM, Michal Hocko wrote:
> [...]
>>> OK so I did with -m 2G,slots=4,maxmem=4G -numa node,mem=1G -numa node,mem=1G which generated
>>> [...]
>>> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
>>> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x3fffffff]
>>> [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x40000000-0x7fffffff]
>>> [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x27fffffff] hotplug
>>> [ 0.000000] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0x3fffffff] -> [mem 0x00000000-0x3fffffff]
>>> [ 0.000000] NODE_DATA(0) allocated [mem 0x3fffc000-0x3fffffff]
>>> [ 0.000000] NODE_DATA(1) allocated [mem 0x7ffdc000-0x7ffdffff]
>>> [ 0.000000] Zone ranges:
>>> [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
>>> [ 0.000000] DMA32 [mem 0x0000000001000000-0x000000007ffdffff]
>>> [ 0.000000] Normal empty
>>> [ 0.000000] Movable zone start for each node
>>> [ 0.000000] Early memory node ranges
>>> [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
>>> [ 0.000000] node 0: [mem 0x0000000000100000-0x000000003fffffff]
>>> [ 0.000000] node 1: [mem 0x0000000040000000-0x000000007ffdffff]
>>>
>>> so there is neither any normal zone nor movable one at the boot time.
>>> Then I hotplugged 1G slot
>>> (qemu) object_add memory-backend-ram,id=mem1,size=1G
>>> (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
>>>
>>> unfortunatelly the memory didn't show up automatically and I got
>>> [ 116.375781] acpi PNP0C80:00: Enumeration failure
>>>
>>> so I had to probe it manually (prbably the BIOS my qemu uses doesn't
>>> support auto probing - I haven't really dug further). Anyway the SRAT
>>> table printed during the boot told that we should start at 0x100000000
>>>
>>> # echo 0x100000000 > /sys/devices/system/memory/probe
>>> # grep . /sys/devices/system/memory/memory32/valid_zones
>>> Normal Movable
>>>
>>> which looks reasonably right? Both Normal and Movable zones are allowed
>>>
>>> # echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
>>> # grep . /sys/devices/system/memory/memory3?/valid_zones
>>> /sys/devices/system/memory/memory32/valid_zones:Normal
>>> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
>>>
>>> Huh, so our valid_zones have changed under our feet...
>>>
>>> # echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
>>> # grep . /sys/devices/system/memory/memory3?/valid_zones
>>> /sys/devices/system/memory/memory32/valid_zones:Normal
>>> /sys/devices/system/memory/memory33/valid_zones:Normal
>>> /sys/devices/system/memory/memory34/valid_zones:Normal Movable
>>>
>>> and again. So only the last memblock is considered movable. Let's try to
>>> online them now.
>>>
>>> # echo online_movable > /sys/devices/system/memory/memory34/state
>>> # grep . /sys/devices/system/memory/memory3?/valid_zones
>>> /sys/devices/system/memory/memory32/valid_zones:Normal
>>> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
>>> /sys/devices/system/memory/memory34/valid_zones:Movable Normal
>>>
>>
>> I think there is no strong reason which kernel has the restriction.
>> By setting the restrictions, it seems to have made management of
>> these zone structs simple.
>
> Could you be more specific please? How could this make management any
> easier when udev is basically racing with the physical hotplug and the
> result is basically undefined?
>
When changing zone from NORMAL(N) to MOVALBE(M), we must resize both zones,
zone->zone_start_pfn and zone->spanned_pages. Currently there is the
restriction.
So we just simply change:
zone(N)->spanned_pages -= nr_pages
zone(M)->zone_start_pfn -= nr_pages
But if every memory can change zone with no restriction, we must recalculate
these zones spanned_pages and zone_start_pfn follows:
memory section #
1 2 3 4 5 6 7
|N|M|N|N|N|M|M|
|
|N|N|N|N|N|M|M|
* change memory section #2 from MOVABLE to NORMAL.
then we must find next movable memory section (#6) and resize these zones.
I think when implementing movable memory, there is no requirement of this.
So kernel has the current restriction.
Thanks,
Yasuaki Ishimatsu
Powered by blists - more mailing lists