lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5e01bd6f-4073-1ebb-489d-2e5c529909a2@redhat.com>
Date:   Tue, 8 Jun 2021 15:04:19 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Mike Rapoport <rppt@...nel.org>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Muchun Song <songmuchun@...edance.com>,
        Pavel Tatashin <pasha.tatashin@...een.com>,
        Jonathan Corbet <corbet@....net>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH v1] memory-hotplug.rst: complete admin-guide overhaul

>> +ZONE_MOVABLE
>> +============
>> +
>> +ZONE_MOVABLE is an important mechanism for more reliable memory offlining.
>> +Further, having system RAM managed by ZONE_MOVABLE instead of one of the
>> +kernel zones can increase the number of possible transparent huge pages and
>> +dynamically allocated huge pages.
>> +
> 
> I'd move the first two paragraphs from "Zone Imbalances" here to provide
> some context what is movable and what is unmovable allocation.

Makes sense.

[...]

>> -How to offline memory
>> ----------------------
>> +Considerations
> 
> ZONE_MOVABLE Sizing Considerations ?
> 

Ack

> I'd also move the contents of "Boot Memory and ZONE_MOVABLE" here (with
> some adjustments):
> 
>    By default, all the memory configured at boot time is managed by the kernel
>    zones and ZONE_MOVABLE is not used.
> 
>    To enable ZONE_MOVABLE to include the memory present at boot and to
>    control the ratio between movable and kernel zones there are two command
>    line options: ``kernelcore=`` and ``movablecore=``. See
>    Documentation/admin-guide/kernel-parameters.rst for their description.
> 

Makes sense. I'll move it to the end of the "ZONE_MOVABLE Sizing 
Considerations" section.

>> +--------------
>>   
>> -You can offline a memory block by using the same sysfs interface that was used
>> -in memory onlining::
>> +We usually expect that a large portion of available system RAM will actually
>> +be consumed by user space, either directly or indirectly via the page cache. In
>> +the normal case, ZONE_MOVABLE can be used when allocating such pages just fine.
>>   
>> -	% echo offline > /sys/devices/system/memory/memoryXXX/state
>> +With that in mind, it makes sense that we can have a big portion of system RAM
>> +managed by ZONE_MOVABLE. However, there are some things to consider when
>> +using ZONE_MOVABLE, especially when fine-tuning zone ratios:
>>   
>> -If offline succeeds, the state of the memory block is changed to be "offline".
>> -If it fails, some error core (like -EBUSY) will be returned by the kernel.
>> -Even if a memory block does not belong to ZONE_MOVABLE, you can try to offline
>> -it.  If it doesn't contain 'unmovable' memory, you'll get success.
>> +- Having a lot of offline memory blocks. Even offline memory blocks consume
>> +  memory for metadata and page tables in the direct map; having a lot of
>> +  offline memory blocks is not a typical case, though.
>> +
>> +- Memory ballooning. Some memory ballooning implementations, such as
>> +  the Hyper-V balloon, the XEN balloon, the vbox balloon and the VMWare
> 
> So, everyone except virtio-mem? ;-)

Well, virtio-mem does not classify as memory balloon in that sense, as 
it only operates on own device memory ;)

virtio-balloon and pseries CMM support balloon compaction.

> I'd drop the names because if some of those will implement balloon
> compaction they surely will forget to update the docs.

I can do the opposite and mention the ones that already do. Some most 
probably will never support it.

"Memory ballooning without balloon compaction is incompatible with 
ZONE_MOVABLE. Only some implementations, such as virtio-balloon and 
pseries CMM, fully support balloon compaction."


> 
>> +  balloon with huge pages don't support balloon compaction and, thereby
>> +  ZONE_MOVABLE.
>> +
>> +  Further, CONFIG_BALLOON_COMPACTION might be disabled. In that case, balloon
>> +  inflation will only perform unmovable allocations and silently create a
>> +  zone imbalance, usually triggered by inflation requests from the
>> +  hypervisor.
>> +
>> +- Gigantic pages are unmovable, resulting in user space consuming a
>> +  lot of unmovable memory.
>> +
>> +- Huge pages are unmovable when an architectures does not support huge
>> +  page migration, resulting in a similar issue as with gigantic pages.
>> +
>> +- Page tables are unmovable. Excessive swapping, mapping extremely large
>> +  files or ZONE_DEVICE memory can be problematic, although only
>> +  really relevant in corner cases. When we manage a lot of user space memory
>> +  that has been swapped out or is served from a file/pmem/... we still need
> 
>                                                       ^ persistent memory

Agreed.

> 
>> +  a lot of page tables to manage that memory once user space accessed that
>> +  memory once.
>> +
>> +- DAX: when we have a lot of ZONE_DEVICE memory added to the system as DAX
>> +  and we are not using an altmap to allocate the memmap from device memory
>> +  directly, we will have to allocate the memmap for this memory from the
>> +  kernel zones.
> 
> I'm not sure admin-guide reader will know when we use altmap when we don't.
> Maybe
> 
>    DAX: in certain DAX configurations the memory map for the device memory will
>    be allocated from the kernel zones.

Indeed, simpler and communicates the same message.

I'll also add

"KASAN can have a significant memory overhead, for example, consuming 
1/8th of the total system memory size as (unmovable) tracking metadata."


Thanks Mike!

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ