[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4057479d-6ece-49a2-b823-99748e8c9c35@redhat.com>
Date: Mon, 28 Jul 2025 10:44:13 +0200
From: David Hildenbrand <david@...hat.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Michal Hocko <mhocko@...e.com>, Hannes Reinecke <hare@...nel.org>
Subject: Re: [RFC] Disable auto_movable_ratio for selfhosted memmap
On 28.07.25 10:15, Oscar Salvador wrote:
> Hi,
Hi,
>
> Currently, we have several mechanisms to pick a zone for the new memory we are
> onlining.
> Eventually, we will land on zone_for_pfn_range() which will pick the zone.
>
> Two of these mechanisms are 'movable_node' and 'auto-movable' policy.
> The former will put every single hotpluggled memory in ZONE_MOVABLE
> (unless we can keep zones contiguous by not doing so), while the latter
> will put it in ZONA_MOVABLE IFF we are within the established ratio
> MOVABLE:KERNEL.
It's more complicated, because we have the concept of memory groups.
Dynamic memory groups allow for a mixture of MOVABLE vs. NORMAL within
the group, static memory groups want a single type.
Hotplugging a large DIMM would online it either as MOVABLE or NORMAL.
Similarly with CXL.
>
> It seems, the later doesn't play well with CXL memory where CXL cards hold really
> large amounts of memory, making the ratio fail, and since CXL cards must be removed
> as a unit, it can't be done if any memory block fell within
> !ZONE_MOVABLE zone.
So, user space configured a ratio and the kernel does exactly that: obey
the ratio.
>
> One way to tackle this would be update the ratio every time a new CXL
> card gets inserted, but this seems suboptimal.
> Another way is that since CXL memory works with selfhosted memmap, we could relax
> the check when 'auto-movable' and only look at the ratio if we aren't
> working with selfhosted memmap.
The memmap is only a small piece of unmovable data we require late at
runtime (a bigger factor is user space page tables actually mapping that
memory). The zone ratio we have configured in the kernel dates back to
the highmem times, where such ratios were considered safe. Maybe there
are better defaults for the ratios today, but it really depends on the
workload.
One could find ways of subtracting the selfhosted part, to account it
differently in the kernel, but the memmap is not the only consumer that
affects the ratio.
I mean, the memmap is roughly 1.6%, I don't think that really makes a
difference for you, does it? Can you share some real-life examples?
I have a colleague working on one of my old prototypes (memoryhotplugd)
for replacing udev rules.
The idea there is, to detect that CXL memory is getting hotplugged and
keep it offline. Because user space hotplugging that memory (daxctl)
will explicitly online it to the proper zone.
Things like virtio-mem, DIMMs etc can happily use the auto-movable
behavior. But the auto-movable behavior doesn't quite make sense if (a)
you want everything movable and (b) daxctl already expects to online the
memory itself, usually to ZONE_MOVABLE.
So I think this is mostly a user-space problem to solve.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists