lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 10 Mar 2017 12:39:27 -0500
From:   Yasuaki Ishimatsu <yasu.isimatu@...il.com>
To:     Michal Hocko <mhocko@...nel.org>,
        Igor Mammedov <imammedo@...hat.com>
Cc:     Heiko Carstens <heiko.carstens@...ibm.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        David Rientjes <rientjes@...gle.com>,
        Daniel Kiper <daniel.kiper@...cle.com>,
        linux-api@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        linux-s390@...r.kernel.org, xen-devel@...ts.xenproject.org,
        linux-acpi@...r.kernel.org, qiuxishi@...wei.com,
        toshi.kani@....com, xieyisheng1@...wei.com, slaoub@...il.com,
        iamjoonsoo.kim@....com, vbabka@...e.cz,
        Zhang Zhen <zhenzhang.zhang@...wei.com>,
        Reza Arbab <arbab@...ux.vnet.ibm.com>,
        Tang Chen <tangchen@...fujitsu.com>
Subject: Re: WTH is going on with memory hotplug sysf interface



On 03/10/2017 08:58 AM, Michal Hocko wrote:
> Let's CC people touching this logic. A short summary is that onlining
> memory via udev is currently unusable for online_movable because blocks
> are added from lower addresses while movable blocks are allowed from
> last blocks. More below.
>
> On Thu 09-03-17 13:54:00, Michal Hocko wrote:
>> On Tue 07-03-17 13:40:04, Igor Mammedov wrote:
>>> On Mon, 6 Mar 2017 15:54:17 +0100
>>> Michal Hocko <mhocko@...nel.org> wrote:
>>>
>>>> On Fri 03-03-17 18:34:22, Igor Mammedov wrote:
>> [...]
>>>>> in current mainline kernel it triggers following code path:
>>>>>
>>>>> online_pages()
>>>>>   ...
>>>>>        if (online_type == MMOP_ONLINE_KERNEL) {
>>>>>                 if (!zone_can_shift(pfn, nr_pages, ZONE_NORMAL, &zone_shift))
>>>>>                         return -EINVAL;
>>>>
>>>> Are you sure? I would expect MMOP_ONLINE_MOVABLE here
>>> pretty much, reproducer is above so try and see for yourself
>>
>> I will play with this...
>
> OK so I did with -m 2G,slots=4,maxmem=4G -numa node,mem=1G -numa node,mem=1G which generated
> [...]
> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x3fffffff]
> [    0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x40000000-0x7fffffff]
> [    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x27fffffff] hotplug
> [    0.000000] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0x3fffffff] -> [mem 0x00000000-0x3fffffff]
> [    0.000000] NODE_DATA(0) allocated [mem 0x3fffc000-0x3fffffff]
> [    0.000000] NODE_DATA(1) allocated [mem 0x7ffdc000-0x7ffdffff]
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> [    0.000000]   DMA32    [mem 0x0000000001000000-0x000000007ffdffff]
> [    0.000000]   Normal   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
> [    0.000000]   node   0: [mem 0x0000000000100000-0x000000003fffffff]
> [    0.000000]   node   1: [mem 0x0000000040000000-0x000000007ffdffff]
>
> so there is neither any normal zone nor movable one at the boot time.
> Then I hotplugged 1G slot
> (qemu) object_add memory-backend-ram,id=mem1,size=1G
> (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
>
> unfortunatelly the memory didn't show up automatically and I got
> [  116.375781] acpi PNP0C80:00: Enumeration failure
>
> so I had to probe it manually (prbably the BIOS my qemu uses doesn't
> support auto probing - I haven't really dug further). Anyway the SRAT
> table printed during the boot told that we should start at 0x100000000
>
> # echo 0x100000000 > /sys/devices/system/memory/probe
> # grep . /sys/devices/system/memory/memory32/valid_zones
> Normal Movable
>
> which looks reasonably right? Both Normal and Movable zones are allowed
>
> # echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
> # grep . /sys/devices/system/memory/memory3?/valid_zones
> /sys/devices/system/memory/memory32/valid_zones:Normal
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
>
> Huh, so our valid_zones have changed under our feet...
>
> # echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
> # grep . /sys/devices/system/memory/memory3?/valid_zones
> /sys/devices/system/memory/memory32/valid_zones:Normal
> /sys/devices/system/memory/memory33/valid_zones:Normal
> /sys/devices/system/memory/memory34/valid_zones:Normal Movable
>
> and again. So only the last memblock is considered movable. Let's try to
> online them now.
>
> # echo online_movable > /sys/devices/system/memory/memory34/state
> # grep . /sys/devices/system/memory/memory3?/valid_zones
> /sys/devices/system/memory/memory32/valid_zones:Normal
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> /sys/devices/system/memory/memory34/valid_zones:Movable Normal
>

I think there is no strong reason which kernel has the restriction.
By setting the restrictions, it seems to have made management of
these zone structs simple.

Thanks,
Yasuaki Ishimatsu

> This would explain why onlining from the last block actually works but
> to me this sounds like a completely crappy behavior. All we need to
> guarantee AFAICS is that Normal and Movable zones do not overlap. I
> believe there is even no real requirement about ordering of the physical
> memory in Normal vs. Movable zones as long as they do not overlap. But
> let's keep it simple for the start and always enforce the current status
> quo that Normal zone is physically preceeding Movable zone.
> Can somebody explain why we cannot have a simple rule for Normal vs.
> Movable which would be:
> 	- block [pfn, pfn+block_size] can be Normal if
> 	  !zone_populated(MOVABLE) || pfn+block_size < ZONE_MOVABLE->zone_start_pfn
> 	- block [pfn, pfn+block_size] can be Movable if
> 	  !zone_populated(NORMAL) || ZONE_NORMAL->zone_end_pfn < pfn
>
> I haven't fully grokked all the restrictions on the movable zone size
> based on the kernel parameters (find_zone_movable_pfns_for_nodes) but
> this shouldn't really make the situation really much more complicated I
> believe because those parameters should be mostly about static
> initialization rather than hotplug but I might be easily missing
> something.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ