[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190801073931.GA16659@linux>
Date: Thu, 1 Aug 2019 09:39:40 +0200
From: Oscar Salvador <osalvador@...e.de>
To: akpm@...ux-foundation.org
Cc: dan.j.williams@...el.com, david@...hat.com,
pasha.tatashin@...een.com, mhocko@...e.com,
anshuman.khandual@....com, Jonathan.Cameron@...wei.com,
vbabka@...e.cz, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/5] Allocate memmap from hotadded memory
On Thu, Jul 25, 2019 at 06:02:02PM +0200, Oscar Salvador wrote:
> Here we go with v3.
>
> v3 -> v2:
> * Rewrite about vmemmap pages handling.
> Prior to this version, I was (ab)using hugepages fields
> from struct page, while here I am officially adding a new
> sub-page type with the fields I need.
>
> * Drop MHP_MEMMAP_{MEMBLOCK,DEVICE} in favor of MHP_MEMMAP_ON_MEMORY.
> While I am still not 100% if this the right decision, and while I
> still see some gaining in having MHP_MEMMAP_{MEMBLOCK,DEVICE},
> having only one flag ease the code.
> If the user wants to allocate memmaps per memblock, it'll
> have to call add_memory() variants with memory-block granularity.
>
> If we happen to have a more clear usecase MHP_MEMMAP_MEMBLOCK
> flag in the future, so user does not have to bother about the way
> it calls add_memory() variants, but only pass a flag, we can add it.
> Actually, I already had the code, so add it in the future is going to be
> easy.
>
> * Granularity check when hot-removing memory.
> Just checking that the granularity is the same.
>
> [Testing]
>
> - x86_64: small and large memblocks (128MB, 1G and 2G)
>
> So far, only acpi memory hotplug uses the new flag.
> The other callers can be changed depending on their needs.
>
> [Coverletter]
>
> This is another step to make memory hotplug more usable. The primary
> goal of this patchset is to reduce memory overhead of the hot-added
> memory (at least for SPARSEMEM_VMEMMAP memory model). The current way we use
> to populate memmap (struct page array) has two main drawbacks:
>
> a) it consumes an additional memory until the hotadded memory itself is
> onlined and
> b) memmap might end up on a different numa node which is especially true
> for movable_node configuration.
>
> a) it is a problem especially for memory hotplug based memory "ballooning"
> solutions when the delay between physical memory hotplug and the
> onlining can lead to OOM and that led to introduction of hacks like auto
> onlining (see 31bc3858ea3e ("memory-hotplug: add automatic onlining
> policy for the newly added memory")).
>
> b) can have performance drawbacks.
>
> One way to mitigate all these issues is to simply allocate memmap array
> (which is the largest memory footprint of the physical memory hotplug)
> from the hot-added memory itself. SPARSEMEM_VMEMMAP memory model allows
> us to map any pfn range so the memory doesn't need to be online to be
> usable for the array. See patch 3 for more details.
> This feature is only usable when CONFIG_SPARSEMEM_VMEMMAP is set.
>
> [Overall design]:
>
> Implementation wise we reuse vmem_altmap infrastructure to override
> the default allocator used by vmemap_populate. Once the memmap is
> allocated we need a way to mark altmap pfns used for the allocation.
> If MHP_MEMMAP_ON_MEMORY flag was passed, we set up the layout of the
> altmap structure at the beginning of __add_pages(), and then we call
> mark_vmemmap_pages().
>
> MHP_MEMMAP_ON_MEMORY flag parameter will specify to allocate memmaps
> from the hot-added range.
> If callers wants memmaps to be allocated per memory block, it will
> have to call add_memory() variants in memory-block granularity
> spanning the whole range, while if it wants to allocate memmaps
> per whole memory range, just one call will do.
>
> Want to add 384MB (3 sections, 3 memory-blocks)
> e.g:
>
> add_memory(0x1000, size_memory_block);
> add_memory(0x2000, size_memory_block);
> add_memory(0x3000, size_memory_block);
>
> or
>
> add_memory(0x1000, size_memory_block * 3);
>
> One thing worth mention is that vmemmap pages residing in movable memory is not a
> show-stopper for that memory to be offlined/migrated away.
> Vmemmap pages are just ignored in that case and they stick around until sections
> referred by those vmemmap pages are hot-removed.
Gentle ping :-)
--
Oscar Salvador
SUSE L3
Powered by blists - more mailing lists