[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9e152d8d-4b39-4a6c-93be-694a28686c07@redhat.com>
Date: Mon, 28 Jul 2025 11:42:58 +0200
From: David Hildenbrand <david@...hat.com>
To: Hannes Reinecke <hare@...e.de>, Oscar Salvador <osalvador@...e.de>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Michal Hocko <mhocko@...e.com>, Hannes Reinecke <hare@...nel.org>
Subject: Re: [RFC] Disable auto_movable_ratio for selfhosted memmap
On 28.07.25 11:28, Hannes Reinecke wrote:
> On 7/28/25 10:44, David Hildenbrand wrote:
>> On 28.07.25 10:15, Oscar Salvador wrote:
>>> Hi,
> [ .. ]
>>>
>>> One way to tackle this would be update the ratio every time a new CXL
>>> card gets inserted, but this seems suboptimal.
>>> Another way is that since CXL memory works with selfhosted memmap, we
>>> could relax
>>> the check when 'auto-movable' and only look at the ratio if we aren't
>>> working with selfhosted memmap.
>>
>> The memmap is only a small piece of unmovable data we require late at
>> runtime (a bigger factor is user space page tables actually mapping that
>> memory). The zone ratio we have configured in the kernel dates back to
>> the highmem times, where such ratios were considered safe. Maybe there
>> are better defaults for the ratios today, but it really depends on the
>> workload.
>>
> Point is, the ratio is accounted for the _entire_ memory.
> Which means that you have to _know_ how much memory you are going to
> plug in prior to plugging that in.
> So to make that correct one would need to update the ratio prior to>
plug in one module, check if that succeeded, update the ratio, plug
> in the next module, check that, etc.
I am confused. We know how big a DIMM is at the time we plug it. I
assume you talk about CXL?
Can you describe how that workflow would look like with tools like daxctl?
(what is a "module"? A DIMM?)
>
>> One could find ways of subtracting the selfhosted part, to account it
>> differently in the kernel, but the memmap is not the only consumer that
>> affects the ratio.
>>
>> I mean, the memmap is roughly 1.6%, I don't think that really makes a
>> difference for you, does it? Can you share some real-life examples?
>>
>>
>> I have a colleague working on one of my old prototypes (memoryhotplugd)
>> for replacing udev rules.
>>
>> The idea there is, to detect that CXL memory is getting hotplugged and
>> keep it offline. Because user space hotplugging that memory (daxctl)
>> will explicitly online it to the proper zone.
>>
>> Things like virtio-mem, DIMMs etc can happily use the auto-movable
>> behavior. But the auto-movable behavior doesn't quite make sense if (a)
>> you want everything movable and (b) daxctl already expects to online the
>> memory itself, usually to ZONE_MOVABLE.
>>
>> So I think this is mostly a user-space problem to solve.
>>
> Hmm.
> Yes, and no.
>
> While CXL memory is hotpluggable (it's a PCI device, after all),
> it won't be hotplugged on a regular basis.
I've been told that with dynamic memory pooling it is supposed to get
much more dynamic.
> So the current use-case I'm aware of is that the system will be
> configured once, and then it will be expected to come up in the
> very same state after reboot.
> As such a daemon is a bit of an overkill, as the number of events
> it would need to listen to is in the very low single-digit range.
I am mostly concerned with all the use cases that existed before CXL (in
particular, virtio-mem, standby memory on s390x, DIMMs) where you see
memory hotplug way more frequently and also would want to deal with
things such as memory onlining failing in some environments more
gracefully (e.g., retry).
What I realized is that
(1) udev rules are not a good for all use cases
(2) auto-onlining in the kernel is not good fit for all use cases
The goal of the daemon will be to configure auto-onlining in the kernel
where possible (e.g., only virtio-mem, only CXL), but fallback to manual
onlining in case mixtures might be possible (CXL and virtio-mem etc). I
expect the latter to be rare, but sometimes we can't make a fully
reliable decision of what might get hotplugged in the future ...
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists