[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXLAtVZ_bVwF9nBG@gourry-fedora-PF4VCD3F>
Date: Thu, 22 Jan 2026 19:28:37 -0500
From: Gregory Price <gourry@...rry.net>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>
Cc: linux-cxl@...r.kernel.org, dan.j.williams@...el.com,
dave.jiang@...el.com, jonathan.cameron@...wei.com,
alison.schofield@...el.com, ira.weiny@...el.com, dave@...olabs.net,
linux-kernel@...r.kernel.org, kernel-team@...a.com,
vishal.l.verma@...el.com, benjamin.cheatham@....com,
David Rientjes <rientjes@...gle.com>
Subject: Re: cxl/region.c improvements and DAX/Hotplug plumbing
On Thu, Jan 22, 2026 at 11:14:15PM +0100, David Hildenbrand (Red Hat) wrote:
> Some of that (especially the interaction with core-mm) feels like it would
> be a good fit to discuss with he wider MM community in one of the bi-weekly
> mm meeting. (CCing David R.)
>
There is a Monthly Linux-DAX meeting, and a Monthly Linux-CXL meeting,
obviously this is a lot of cross-attendance.
Happy to attend additional discussion. I was trying to shore up some of
the cxl-region plumbing aspects before going wider.
> > - hiding memory blocks? (discussed in last meeting)
>
> What is that about and what was the result of that discussion? :)
>
It was just a question as to whether memory blocks are still useful
if the intent is to provide a collective hotplug interface. I don't
think there are any real proposals for this, just making note of it.
> > Solution 2: Make a dedicated sysram_region with policy
>
> What kind of region would that be?
plumbing between regionN and dax_region kobjects
right now the kobject relationship is:
region0 <- cxl driver created kobject
└dax_region0 <- default selects IORESOURCE_DAX_KMEM
└dax0.0 <- auto-probes on discovery
But there is baggage in the existing plumbing:
1) dax/cxl.c => hard-coded IORESOURCE_DAX_KMEM for dax_region
2) dax/bus.c => devdax is probed on discovery w/o manual bind step
3) cxl/core/region.c => BIOS-configured CXL regions automatically
generate a dax_region, and this auto-creates a dax_kmem device
which is subject to system-wide MHP policy.
This creates a backwards compatibility headache.
The same auto-plumbing is used in the manual creation path, so:
echo regionN > cxl/decoder0.0/create_ram_region
/* program decoders */
echo regionN > cxl/drivers/region/bind
will pump the whole thing directly into dax_kmem and auto-online
according to system default MHP policy. There's no intermediate
step in which the user can define preferences (unless you add
them as attributes to regionN - which is another option).
Adding the intermediate object:
regionN
└sysram_region <- encodes policy like hotplug and dax drv
└dax_regionN <- which would be passed here on creation
└dax0.0
lets the cxl-cli command to be more expressive:
`cxl-cli create-region -t ram --driver=sysram` => kmem
`cxl-cli create-region -t ram --driver=dax` => device_dax
and would change the sysfs pattern to
echo regionN > cxl/decoder0.0/create_ram_region
echo regionN > cxl/drivers/sysram_region/bind
echo online_movable > cxl/devices/dax_regionN/hotplug
echo dax_regionN > cxl/drivers/dax_region/bind
and gives the user a chance to configure a policy before the region
is pumped all the way through to the endpoint dax driver.
(Much of the rest of this doc is QoL stuff that could be ignored)
> > Solution 2: dedicated sysram_region driver w/ or w/o DAX.
> > Can support sparseness w/o DAX (see DCD problem)
> > Could use DAX for tagged DCD regions.
> > Tradeoff: May duplicate some DAX logic.
>
> How would that look like?
For untagged extents w/o dax:
sysram_region->nr_range
sysram_region->ranges[0 : nr_range-1]
Extents in this list would be hotpluggable individually and
could be returned to the DCD device individually
sysram_region.c code would call hotplug directly, not via dax.
- hence, this duplicates some DAX logic
The above just prevents needlessly creating dax-indirection for sysram
extents with only one destination: add_memory_driver_managed()
For tagged extents:
sysram_region->nr_regions
sysram_region->dax_regions[0 : nr_regions]
A set of tagged extents would only be hotpluggable as a group
and could only be returned to the DCD as a group.
it would also expose: dax0.0/uuid <- contains the tag
from this you get a cli command like
cxl release-extents regionN [--id=X] [--tag=Y]
translates to something like
echo "release" > regionN/sysram_region/extents/[X,Y]
Something like this.
> >
> > Solution 4: Prevent non-driver actions from changing state.
> > Also solves hotplug protection problem (see next)
>
> The crucial part is solving what you spelled out in the description: "race
> conditions". Forbidding someone to re-configure system RAM sounds
> unnecessary.
>
> For example, I use it a lot for testing issues with page migration while
> offlining memory from ZONE_MOVABLE.
>
For most use-cases yes. For something like FAMFS (distributed shared
memory), one system onlining a block as kmem could be potentially
destructive to an entirely separate physical server.
A small guardrail to prevent silly mistakes, but certainly not required
Probably not needed for sysram and normal dax regions.
But fair, I can drop this. If an actual issue shows up, this can be
restricted with memory_notifier pretty trivially.
> > Example: Slow(er) memory
> > Some memory is "just memory", but might be particularly slow and
> > intended for use as a filesystem backend or as only a demotion
> > target. Otherwise its allocated / mapped like any other memory,
> > but it still required isolation so isolated to the demotion path
> > and not a fallback allocation target
>
> That doesn't quite fit the description of N_PRIVATE_MEMORY, though. Or what
> am I missing?
I suppose we could also explore a per-node fallback policy to accomplish
this - but there was also the LPC talk about trying to deprecate that
entirely.
For the filesystem piece, you're probably right.
~Gregory
Powered by blists - more mailing lists