lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2123c5e8-bba6-4ce8-9050-266a63cc2f14@redhat.com>
Date:   Thu, 16 Nov 2023 20:02:33 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Gerald Schaefer <gerald.schaefer@...ux.ibm.com>
Cc:     Sumanth Korikkar <sumanthk@...ux.ibm.com>,
        linux-mm <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        linux-s390 <linux-s390@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 4/8] mm/memory_hotplug: introduce MEM_PHYS_ONLINE/OFFLINE
 memory notifiers

On 15.11.23 16:03, Gerald Schaefer wrote:
> On Tue, 14 Nov 2023 19:27:35 +0100
> David Hildenbrand <david@...hat.com> wrote:
> 
>> On 14.11.23 19:02, Sumanth Korikkar wrote:
>>> Add new memory notifiers to mimic the dynamic ACPI event triggered logic
>>> for memory hotplug on platforms that do not generate such events. This
>>> will be used to implement "memmap on memory" feature for s390 in a later
>>> patch.
>>>
>>> Platforms such as x86 can support physical memory hotplug via ACPI. When
>>> there is physical memory hotplug, ACPI event leads to the memory
>>> addition with the following callchain:
>>> acpi_memory_device_add()
>>>     -> acpi_memory_enable_device()
>>>        -> __add_memory()
>>>
>>> After this, the hotplugged memory is physically accessible, and altmap
>>> support prepared, before the "memmap on memory" initialization in
>>> memory_block_online() is called.
>>>
>>> On s390, memory hotplug works in a different way. The available hotplug
>>> memory has to be defined upfront in the hypervisor, but it is made
>>> physically accessible only when the user sets it online via sysfs,
>>> currently in the MEM_GOING_ONLINE notifier. This requires calling
>>> add_memory() during early memory detection, in order to get the sysfs
>>> representation, but we cannot use "memmap on memory" altmap support at
>>> this stage, w/o having it physically accessible.
>>>
>>> Since no ACPI or similar events are generated, there is no way to set up
>>> altmap support, or even make the memory physically accessible at all,
>>> before the "memmap on memory" initialization in memory_block_online().
>>>
>>> The new MEM_PHYS_ONLINE notifier allows to work around this, by
>>> providing a hook to make the memory physically accessible, and also call
>>> __add_pages() with altmap support, early in memory_block_online().
>>> Similarly, the MEM_PHYS_OFFLINE notifier allows to make the memory
>>> inaccessible and call __remove_pages(), at the end of
>>> memory_block_offline().
>>>
>>> Calling __add/remove_pages() requires mem_hotplug_lock, so move
>>> mem_hotplug_begin/done() to include the new notifiers.
>>>
>>> All architectures ignore unknown memory notifiers, so this patch should
>>> not introduce any functional changes.
>>
>> Sorry to say, no. No hacks please, and this is a hack for memory that
>> has already been added to the system.
> 
> IIUC, when we enter memory_block_online(), memory has always already
> been added to the system, on all architectures. E.g. via ACPI events
> on x86, or with the existing s390 hack, where we add it at boot time,
> including memmap allocated from system memory. Without a preceding
> add_memory() you cannot reach memory_block_online() via sysfs online.

Adding that memory block at boot time is the legacy leftover s390x is 
carrying along; and now we want to "workaround" that by adding s390x 
special handling for online/offlining code and having memory blocks 
without any memmap, or configuring an altmap in the very last minute 
using a s390x specific memory notifier.

Instead, if you want to support the altmap, the kernel should not add 
standby memory to the system (if configured for this new feature), but 
instead only remember the standby memory ranges so it knows what can 
later be added and what can't.

 From there, users should have an interface where they can actually add 
memory to the system, and either online it manually or just let the 
kernel online it automatically.

s390x code will call add_memory() and properly prepare an altmap if 
requested and make that standby memory available. You can then even have 
an interface to remove that memory again once offline. That will work 
with an altmap or without an altmap.

This approach is aligned with any other code that hot(un)plugs memory 
and is compatible with things like variable-sized memory blocks people 
have been talking about quite a while already, and altmaps that span 
multiple memory blocks to make gigantic pages in such ranges usable.

Sure, you'll have a new interface and have to enable the new handling 
for the new kernel, but you're asking for supporting a new feature that 
cannot be supported cleanly just like any other architecture does. But 
it's a clean approach and probably should have been done that way right 
from the start (decades ago).

Note: We do have the same for other architectures without ACPI that add 
memory via the probe interface. But IIRC we cannot really do any checks 
there, because these architectures have no way of identifying what

> 
> The difference is that for s390, the memory is not yet physically
> accessible, and therefore we cannot use the existing altmap support
> in memory_block_online(), which requires that the memory is accessible
> before it calls mhp_init_memmap_on_memory().
> 
> Currently, on s390 we make the memory accessible in the GOING_ONLINE
> notifier, by sclp call to the hypervisor. That is too late for altmap
> setup code in memory_block_online(), therefore we'd like to introduce
> the new notifier, to have a hook where we can make it accessible
> earlier, and after that there is no difference to how it works for
> other architectures, and we can make use of the existing altmap support.
> 
>>
>> If you want memory without an altmap to suddenly not have an altmap
>> anymore, then look into removing and readding that memory, or some way
>> to convert offline memory.
> 
> We do not want to have memory suddenly not have an altmap support
> any more, but simply get a hook so that we can prepare the memory
> to have altmap support. This means making it physically accessible,
> and calling __add_pages() for altmap support, which for other
> architecture has already happened before.
> 
> Of course, it is a hack for s390, that we must skip __add_pages()
> in the initial (arch_)add_memory() during boot time, when we want
> altmap support, because the memory simply is not accessible at that
> time. But s390 memory hotplug support has always been a hack, and
> had to be, because of how it is implemented by the architecture.

I write above paragraph before reading this; and it's fully aligned with 
what I said above.

> 
> So we replace one hack with another one, that has the huge advantage
> that we do not need to allocate struct pages upfront from system
> memory any more, for the whole possible online memory range.
> 
> And the current approach comes without any change to existing
> interfaces, and minimal change to common code, i.e. these new
> notifiers, that should not have any impact on other architectures.
> 
> What exactly is your concern regarding the new notifiers? Is it
> useless no-op notifier calls on other archs (not sure if they
> would get optimized out by compiler)?

That it makes hotplug code more special because of s390x, instead of 
cleaning up that legacy code.

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ