[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aWEvuS95-7yP-Vc8@gourry-fedora-PF4VCD3F>
Date: Fri, 9 Jan 2026 11:41:29 -0500
From: Gregory Price <gourry@...rry.net>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>
Cc: Hannes Reinecke <hare@...e.de>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, kernel-team@...a.com,
osalvador@...e.de, gregkh@...uxfoundation.org, rafael@...nel.org,
dakr@...nel.org, akpm@...ux-foundation.org,
lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com
Subject: Re: [RFC PATCH] memory,memory_hotplug: allow restricting memory
blocks to zone movable
On Thu, Jan 08, 2026 at 03:16:24PM +0100, David Hildenbrand (Red Hat) wrote:
> On 1/8/26 08:31, Hannes Reinecke wrote:
> > On 1/6/26 21:22, David Hildenbrand (Red Hat) wrote:
> > > On 1/6/26 20:59, Gregory Price wrote:
>
> > For hardware-based scenarios memory will always be removed in
> > larger entities (eg the CXL device), and it's always an 'all-or-nothing'
> > scenario; you cannot remove individual memory blocks on a CXL device.
> > So there the memory block abstraction makes less sense, and it
> > would be good to have a single 'knob' to remove the entire CXL
> > device and all memory blocks on it.
> > Sure, it might take some time, but one doesn't need to worry about
> > restoring the original state if the operation on one block fails.
>
> That's not what I was getting at:
>
> offline_and_remove_memory() can be called on large regions, and it properly
> handles whether we have to back out because some offlining failed.
>
> The issue arises once dax would have to call offline_and_remove_memory()
> multiple times, on non-contiguous areas. Of course, we could handle that by
> providing an interface that consumes multiple memory ranges.
>
> For the DAX use case, I thing we'd really want a way to just use
>
> * add_and_online_memory() [does not exist yet, but ppc does something
> similar]
> * offline_and_remove_memory()
>
I'm starting to think this issue is actually the result of bad patterns
in the cxl driver - namely using dax as a path to hotplug sysram.
I suppose either we need a `cxl/dax_region/remove` that handles the
whole operation in one go, or
we want `cxl/region/commit` to handle hot(un)plug as a single action.
tl;dr: Split the dax use case from the sysram use case, and make a
cxl sysram driver directly manage hotplug rather than use dax.
~Gregory
Powered by blists - more mailing lists