lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aC2sUdhavboOgS83@li-2b55cdcc-350b-11b2-a85c-a78bff51fc11.ibm.com>
Date: Wed, 21 May 2025 12:34:57 +0200
From: Sumanth Korikkar <sumanthk@...ux.ibm.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-mm <linux-mm@...ck.org>, Andrew Morton <akpm@...ux-foundation.org>,
        Oscar Salvador <osalvador@...e.de>,
        Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        linux-s390 <linux-s390@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 1/4] mm/memory_hotplug: Add interface for runtime
 (de)configuration of memory

> > Introduce new interface on s390 with the following attributes:
> > 
> > 1) Attribute1:
> > /sys/firmware/memory/block_size_bytes
> 
> I assume this will be the storage increment size.

Hi David,

No, this is memory block size.

> > > 2) Attribute2:
> > /sys/firmware/memory/memoryX/config
> > echo 0 > /sys/firmware/memory/memoryX/config  -> deconfigure memoryX
> > echo 1 > /sys/firmware/memory/memoryX/config ->  configure memoryX
> 
> And these would configure individual storage increments, essentially calling
> add_memory() and (if possible because we could offline the memory)
> remove_memory().

configure or deconfigure memory in units of entire memory blocks.

As I understand it, add_memory() operates on memory block granularity,
and this is enforced by check_hotplug_memory_range(), which ensures the
requested range aligns with the memory block size.

> > 3) Attribute3:
> > /sys/firmware/memory/memoryX/altmap_required
> > echo 0 > /sys/firmware/memory/memoryX/altmap_required -> noaltmap
> > echo 1 > /sys/firmware/memory/memoryX/altmap_required -> altmap
> > echo N > /sys/firmware/memory/memoryX/altmap_required -> variable size
> > 	 altmap grouping (possible future requirements),
> > 	 where N specifies the number of memory blocks that the current
> > 	 memory block manages altmap. There are two possibilities here:
> >          * If the altmap cannot fit entirely within memoryX, it can
> >            extend into memoryX+1, meaning the altmap metadata will span
> >            across multiple memory blocks.
> >          * If the altmap for memory range cannot fit within memoryX,
> >            then config will return -EINVAL.
> 
> Do we really still need this when we can configure/deconfigure?
> 
> I mean, on s390x, the most important use case for memmap-on-memory was not
> wasting memory for offline memory blocks.
> 
> But with a configuration interface like this ... the only benefit is being
> able to more-reliably add memory in low-memory conditions. An unlikely
> scenario with standby storage IMHO.
> 
> Note that I dislike exposing "altmap" to the user :) Dax calls it
> "memmap_on_memory", and it is a device attrivute.
> 
> As soon as we go down that path we have the complexity of having to group
> memory blocks etc, and if we can just not go down that path right now it
> will make things a lot simpler.
> 
> (especially, as you document above, the semantics become *really* weird)
> 
> As yet another point, I am not sure if someone really needs a per-memory
> block control of the memmap-on-memory feature.
> 
> If we could simplify here, that would be great ...

The original motivation for introducing memmap_on_memory on s390 was to
avoid using online memory to store struct page metadata, particularly
for standby memory blocks. This became critical in cases where there was
an imbalance between standby and online memory, potentially leading to
boot failures due to insufficient memory for metadata allocation.

To address this, memmap_on_memory was utilized on s390. However, in its
current form, it adds altmap metadata at the start of each memory block
at the time of addition, and this configuration is static. It cannot be
changed at runtime.

I was wondering about the following practical scenario:

When online memory is nearly full, the user can add a standby memory
block with memmap_on_memory enabled. This allows the system to avoid
consuming already scarce online memory for metadata.

After enabling and bringing that standby memory online, the user now
has enough free online memory to add additional memory blocks without
memmap_on_memory. These later blocks can provide physically contiguous
memory, which is important for workloads or devices requiring continuous
physical address space.

If my interpretation is correct, I see good potential for this be be
useful.

As you pointed out, how about having something similar to
73954d379efd ("dax: add a sysfs knob to control memmap_on_memory behavior")

i.e.

1) To configure/deconfigure a memory block
/sys/firmware/memory/memoryX/config

1 -> configure
0 -> deconfigure

2) Determine whether memory block should have memmap_on_memory or not.
/sys/firmware/memory/memoryX/memmap_on_memory
1 -> with altmap
0 -> without altmap

This attribute must be set before the memoryX is configured. Or else, it
will default to CONFIG_MHP_MEMMAP_ON_MEMORY / memmap_on_memory parameter.


> > NOTE: “altmap_required” attribute must be set before setting the block as
> > configured via “config” attribute. (Dependancy)
> > 
> > 4) Additionally add the patch to check if the memory block is configured
> > with altmap or not. Similar to [RFC PATCH 2/4] mm/memory_hotplug: Add
> > memory block altmap sysfs attribute.
> > 
> > Most of the code changes will be s390 specific with this interface.
> > 
> > Request your inputs on the potential interface. Thank you.
> > 
> > Other questions:
> > 1. I’m just wondering how variable-sized altmap grouping will be
> > structured in the future. Is it organized by grouping the memory blocks
> > that require altmap, with the first memory block storing the altmap
> > metadata for all of them? Or is it possible for the altmap metadata to
> > span across multiple memory blocks?
> 
> That exactly is unclear, which is why we should probably avoid doing that
> for now. Also, with other developments happening (memdesc), and ongoing
> effort to shrink "struct page", maybe we will not even need most of this in
> the future?
> 
> > 
> > 2. OR, will dedicated memory blocks be used exclusively for altmap
> > metadata, which the memory blocks requiring altmap would then consume? (To
> > prevent fragmentation) ?
> 
> One idea I had was that you would do the add_memory() in bigger granularity.
> 
> Then, the memory blocks hosting the memmap would have to get onlined first.
> And offlining of them would fail until all dependent ones were offlined.
> 
> That would at least limit the impact.
> 
> Then, the question would be, how could you "group" these memory blocks from
> your interface to do a single add_memory() etc.
> 
> But again, maybe we can leave that part out for now ...

Thank you David for the details. I will ignore/leave variable sized
altmap grouping for now.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ