linux-kernel - Re: [RFC PATCH 0/5] mm, memory_hotplug: allocate memmap from hotadded memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 26 Jul 2017 17:06:59 -0400
From:   Jerome Glisse <jglisse@...hat.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>, Vlastimil Babka <vbabka@...e.cz>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Reza Arbab <arbab@...ux.vnet.ibm.com>,
        Yasuaki Ishimatsu <yasu.isimatu@...il.com>,
        qiuxishi@...wei.com, Kani Toshimitsu <toshi.kani@....com>,
        slaoub@...il.com, Joonsoo Kim <js1304@...il.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Daniel Kiper <daniel.kiper@...cle.com>,
        Igor Mammedov <imammedo@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Dan Williams <dan.j.williams@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Michal Hocko <mhocko@...e.com>,
        Paul Mackerras <paulus@...ba.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Tony Luck <tony.luck@...el.com>,
        Will Deacon <will.deacon@....com>
Subject: Re: [RFC PATCH 0/5] mm, memory_hotplug: allocate memmap from
 hotadded memory

On Wed, Jul 26, 2017 at 10:33:28AM +0200, Michal Hocko wrote:
> Hi,
> this is another step to make the memory hotplug more usable. The primary
> goal of this patchset is to reduce memory overhead of the hot added
> memory (at least for SPARSE_VMEMMAP memory model). Currently we use
> kmalloc to poppulate memmap (struct page array) which has two main
> drawbacks a) it consumes an additional memory until the hotadded memory
> itslef is onlined and b) memmap might end up on a different numa node
> which is especially true for movable_node configuration.
> 
> a) is problem especially for memory hotplug based memory "ballooning"
> solutions when the delay between physical memory hotplug and the
> onlining can lead to OOM and that led to introduction of hacks like auto
> onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
> policy for the newly added memory")).
> b) can have performance drawbacks.
> 
> One way to mitigate both issues is to simply allocate memmap array
> (which is the largest memory footprint of the physical memory hotplug)
> from the hotadded memory itself. VMEMMAP memory model allows us to map
> any pfn range so the memory doesn't need to be online to be usable
> for the array. See patch 3 for more details. In short I am reusing an
> existing vmem_altmap which wants to achieve the same thing for nvdim
> device memory.
> 
> I am sending this as an RFC because this has seen only a very limited
> testing and I am mostly interested about opinions on the chosen
> approach. I had to touch some arch code and I have no idea whether my
> changes make sense there (especially ppc). Therefore I would highly
> appreciate arch maintainers to check patch 2.
> 
> Patches 4 and 5 should be straightforward cleanups.
> 
> There is also one potential drawback, though. If somebody uses memory
> hotplug for 1G (gigantic) hugetlb pages then this scheme will not work
> for them obviously because each memory section will contain 2MB reserved
> area.  I am not really sure somebody does that and how reliable that
> can work actually. Nevertheless, I _believe_ that onlining more memory
> into virtual machines is much more common usecase. Anyway if there ever
> is a strong demand for such a usecase we have basically 3 options a)
> enlarge memory sections b) enhance altmap allocation strategy and reuse
> low memory sections to host memmaps of other sections on the same NUMA
> node c) have the memmap allocation strategy configurable to fallback to
> the current allocation.
> 
> Are there any other concerns, ideas, comments?
> 

This does not seems to be an opt-in change ie if i am reading patch 3
correctly if an altmap is not provided to __add_pages() you fallback
to allocating from begining of zone. This will not work with HMM ie
device private memory. So at very least i would like to see some way
to opt-out of this. Maybe a new argument like bool forbid_altmap ?

Cheers,
Jérôme