[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5e01bd6f-4073-1ebb-489d-2e5c529909a2@redhat.com>
Date: Tue, 8 Jun 2021 15:04:19 +0200
From: David Hildenbrand <david@...hat.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>,
Oscar Salvador <osalvador@...e.de>,
Michal Hocko <mhocko@...e.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Anshuman Khandual <anshuman.khandual@....com>,
Muchun Song <songmuchun@...edance.com>,
Pavel Tatashin <pasha.tatashin@...een.com>,
Jonathan Corbet <corbet@....net>,
Stephen Rothwell <sfr@...b.auug.org.au>,
linux-doc@...r.kernel.org
Subject: Re: [PATCH v1] memory-hotplug.rst: complete admin-guide overhaul
>> +ZONE_MOVABLE
>> +============
>> +
>> +ZONE_MOVABLE is an important mechanism for more reliable memory offlining.
>> +Further, having system RAM managed by ZONE_MOVABLE instead of one of the
>> +kernel zones can increase the number of possible transparent huge pages and
>> +dynamically allocated huge pages.
>> +
>
> I'd move the first two paragraphs from "Zone Imbalances" here to provide
> some context what is movable and what is unmovable allocation.
Makes sense.
[...]
>> -How to offline memory
>> ----------------------
>> +Considerations
>
> ZONE_MOVABLE Sizing Considerations ?
>
Ack
> I'd also move the contents of "Boot Memory and ZONE_MOVABLE" here (with
> some adjustments):
>
> By default, all the memory configured at boot time is managed by the kernel
> zones and ZONE_MOVABLE is not used.
>
> To enable ZONE_MOVABLE to include the memory present at boot and to
> control the ratio between movable and kernel zones there are two command
> line options: ``kernelcore=`` and ``movablecore=``. See
> Documentation/admin-guide/kernel-parameters.rst for their description.
>
Makes sense. I'll move it to the end of the "ZONE_MOVABLE Sizing
Considerations" section.
>> +--------------
>>
>> -You can offline a memory block by using the same sysfs interface that was used
>> -in memory onlining::
>> +We usually expect that a large portion of available system RAM will actually
>> +be consumed by user space, either directly or indirectly via the page cache. In
>> +the normal case, ZONE_MOVABLE can be used when allocating such pages just fine.
>>
>> - % echo offline > /sys/devices/system/memory/memoryXXX/state
>> +With that in mind, it makes sense that we can have a big portion of system RAM
>> +managed by ZONE_MOVABLE. However, there are some things to consider when
>> +using ZONE_MOVABLE, especially when fine-tuning zone ratios:
>>
>> -If offline succeeds, the state of the memory block is changed to be "offline".
>> -If it fails, some error core (like -EBUSY) will be returned by the kernel.
>> -Even if a memory block does not belong to ZONE_MOVABLE, you can try to offline
>> -it. If it doesn't contain 'unmovable' memory, you'll get success.
>> +- Having a lot of offline memory blocks. Even offline memory blocks consume
>> + memory for metadata and page tables in the direct map; having a lot of
>> + offline memory blocks is not a typical case, though.
>> +
>> +- Memory ballooning. Some memory ballooning implementations, such as
>> + the Hyper-V balloon, the XEN balloon, the vbox balloon and the VMWare
>
> So, everyone except virtio-mem? ;-)
Well, virtio-mem does not classify as memory balloon in that sense, as
it only operates on own device memory ;)
virtio-balloon and pseries CMM support balloon compaction.
> I'd drop the names because if some of those will implement balloon
> compaction they surely will forget to update the docs.
I can do the opposite and mention the ones that already do. Some most
probably will never support it.
"Memory ballooning without balloon compaction is incompatible with
ZONE_MOVABLE. Only some implementations, such as virtio-balloon and
pseries CMM, fully support balloon compaction."
>
>> + balloon with huge pages don't support balloon compaction and, thereby
>> + ZONE_MOVABLE.
>> +
>> + Further, CONFIG_BALLOON_COMPACTION might be disabled. In that case, balloon
>> + inflation will only perform unmovable allocations and silently create a
>> + zone imbalance, usually triggered by inflation requests from the
>> + hypervisor.
>> +
>> +- Gigantic pages are unmovable, resulting in user space consuming a
>> + lot of unmovable memory.
>> +
>> +- Huge pages are unmovable when an architectures does not support huge
>> + page migration, resulting in a similar issue as with gigantic pages.
>> +
>> +- Page tables are unmovable. Excessive swapping, mapping extremely large
>> + files or ZONE_DEVICE memory can be problematic, although only
>> + really relevant in corner cases. When we manage a lot of user space memory
>> + that has been swapped out or is served from a file/pmem/... we still need
>
> ^ persistent memory
Agreed.
>
>> + a lot of page tables to manage that memory once user space accessed that
>> + memory once.
>> +
>> +- DAX: when we have a lot of ZONE_DEVICE memory added to the system as DAX
>> + and we are not using an altmap to allocate the memmap from device memory
>> + directly, we will have to allocate the memmap for this memory from the
>> + kernel zones.
>
> I'm not sure admin-guide reader will know when we use altmap when we don't.
> Maybe
>
> DAX: in certain DAX configurations the memory map for the device memory will
> be allocated from the kernel zones.
Indeed, simpler and communicates the same message.
I'll also add
"KASAN can have a significant memory overhead, for example, consuming
1/8th of the total system memory size as (unmovable) tracking metadata."
Thanks Mike!
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists