lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 16 Mar 2017 20:01:25 +0100
From:   Andrea Arcangeli <aarcange@...hat.com>
To:     Joonsoo Kim <iamjoonsoo.kim@....com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Vitaly Kuznetsov <vkuznets@...hat.com>, linux-mm@...ck.org,
        Mel Gorman <mgorman@...e.de>, qiuxishi@...wei.com,
        toshi.kani@....com, xieyisheng1@...wei.com, slaoub@...il.com,
        Zhang Zhen <zhenzhang.zhang@...wei.com>,
        Reza Arbab <arbab@...ux.vnet.ibm.com>,
        Yasuaki Ishimatsu <yasu.isimatu@...il.com>,
        Tang Chen <tangchen@...fujitsu.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Daniel Kiper <daniel.kiper@...cle.com>,
        Igor Mammedov <imammedo@...hat.com>,
        Andi Kleen <ak@...ux.intel.com>
Subject: Re: ZONE_NORMAL vs. ZONE_MOVABLE

Hello Joonsoo,

On Thu, Mar 16, 2017 at 02:31:22PM +0900, Joonsoo Kim wrote:
> I don't follow up previous discussion so please let me know if I miss
> something. I'd just like to mention about sticky pageblocks.

The interesting part of the previous discussion relevant for the
sticky movable pageblock is this part from Vitaly:

=== quote ===
Now we have

[Normal][Normal][Normal][Movable][Movable][Movable]

we could have

[Normal][Normal][Movable][Normal][Movable][Normal]
=== quote ===

Suppose you're an admin you can try to do starting from an
all-offlined hotplug memory:

kvm ~ # cat /sys/devices/system/memory/memory3[6-9]/online
0
0
0
0
kvm ~ # python ~andrea/zoneinfo.py 
Zone: DMA       Present: 15M    Managed: 15M    Start: 0M       End: 16M
Zone: DMA32     Present: 2031M  Managed: 1892M  Start: 16M      End: 2047M

All hotplug memory is offline, no Movable zone.

Then you online interleaved:

kvm ~ # echo online_movable > /sys/devices/system/memory/memory39/online
kvm ~ # python ~andrea/zoneinfo.py 
Zone: DMA       Present: 15M    Managed: 15M    Start: 0M       End: 16M
Zone: DMA32     Present: 2031M  Managed: 1892M  Start: 16M      End: 2047M
Zone: Movable   Present: 128M   Managed: 128M   Start: 4.9G     End: 5.0G
kvm ~ # echo online > /sys/devices/system/memory/memory38/online
kvm ~ # python ~andrea/zoneinfo.py 
Zone: DMA       Present: 15M    Managed: 15M    Start: 0M       End: 16M
Zone: DMA32     Present: 2031M  Managed: 1892M  Start: 16M      End: 2047M
Zone: Normal    Present: 128M   Managed: 128M   Start: 4.0G     End: 4.9G
Zone: Movable   Present: 128M   Managed: 128M   Start: 4.9G     End: 5.0G

So far so good.

kvm ~ # echo online_movable > /sys/devices/system/memory/memory37/online
kvm ~ # python ~andrea/zoneinfo.py 
Zone: DMA       Present: 15M    Managed: 15M    Start: 0M       End: 16M
Zone: DMA32     Present: 2031M  Managed: 1892M  Start: 16M      End: 2047M
Zone: Normal    Present: 256M   Managed: 256M   Start: 4.0G     End: 4.9G
Zone: Movable   Present: 128M   Managed: 128M   Start: 4.9G     End: 5.0G

Oops you thought you onlined movable memory37 but instead it silently
went in the normal zone (without even erroring out) and it's
definitely not going to be unpluggable and it's definitely non
movable.... all falls apart here. Admin won't run my zoneinfo.py
script that I had write specifically to understand what a mess what
was happening with online_movable interleaved.

The admin is much better off not touching
/sys/devices/system/memory/memory37 ever, and just use the in-kernel
onlining, at the very least until udev and sys interface are fixed for
both movable and non-movable hotplug onlining.

> Before that, I'd like to say that a lot of code already deals with zone
> overlap. Zone overlap exists for a long time although I don't know exact
> history. IIRC, Mel fixed such a case before and compaction code has a
> check for it. And, I added the overlap check to some pfn iterators which
> doesn't have such a check for preparation of introducing a new zone,
> ZONE_CMA, which has zone range overlap property. See following commits.
> 
> 'ba6b097', '9d43f5a', 'a91c43c'.
> 

So you suggest to create a full overlap like:

     --------------- Movable --------------
     --------------- Normal  --------------

Then search for pages in the Movable zone buddy which will only
contain those that are onlined with echo online_movable?

> Come to my main topic, I disagree that sticky pageblock would be
> superior to the current separate zone approach. There is some reasons
> about the objection to sticky movable pageblock in following link.
> 
> Sticky movable pageblock is conceptually same with MIGRATE_CMA and it
> will cause many subtle issues like as MIGRATE_CMA did for CMA users.
> MIGRATE_CMA introduces many hooks in various code path, and, to fix the
> remaining issues, it needs more hooks. I don't think it is

I'm not saying the sticky movable pageblocks are the way to go, to the
contrary we're saying the Movable zone constraints can better be
satisfied by the in-kernel onlining mechanism and it's overall much
simpler for the user to use the in-kernel onlining, than in trying to
fix udev to be synchronous and implementing sticky movable pageblocks
to make the /sys interface usable without unexpected side effects. And
I would suggest to look into dropping the MOVABLE_NODE config option
first (and turn it in a kernel parameter if something).

I agree sticky movable pageblocks may slowdown things and increase
complexity so it'd be better not having to implement those.

> maintainable approach. If you see following link which implements ZONE
> approach, you can see that many hooks are removed in the end.
> 
> lkml.kernel.org/r/1476414196-3514-1-git-send-email-iamjoonsoo.kim@....com
> 
> I don't know exact requirement on memory hotplug so it would be
> possible that ZONE approach is not suitable for it. But, anyway, sticky
> pageblock seems not to be a good solution to me.

The fact sticky movable pageblocks aren't ideal for CMA doesn't mean
they're not ideal for memory hotunplug though.

With CMA there's no point in having the sticky movable pageblocks
scattered around and it's purely a misfeature to use sticky movable
pageblocks because you need the whole CMA area contiguous hence a
ZONE_CMA is ideal.

As opposed with memory hotplug the sticky movable pageblocks would
allow the kernel to satisfy the current /sys API and they would
provide no downside unlike in the CMA case where the size of the
allocation is unknown.

If we can make zone overlap work with a 100% overlap across the whole
node that would be a fine alternative, the zoneinfo.py output will
look weird, but if that's the only downside it's no big deal. With
sticky movable pageblocks it'll all be ZONE_NORMAL, with overlap it'll
all be both ZONE_NORMAL and ZONE_MOVABLE at the same time.

Again with the in-kernel onlining none of the above is necessary as
nobody should then need to echo online/online_movable >memory*/enabled
ever again and it can all be obsoleted. So before dropping the only
option we have that works flawlessly, we should fix all the above in
udev, /sys and provide full zone overlap or sticky movable pageblocks.

Thanks,
Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ