[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1d7d137-b7c2-4713-8ca4-33b6bc2bea2b@amd.com>
Date: Fri, 30 Jan 2026 15:27:12 -0600
From: "Cheatham, Benjamin" <benjamin.cheatham@....com>
To: Gregory Price <gourry@...rry.net>, <linux-mm@...ck.org>
CC: <linux-cxl@...r.kernel.org>, <nvdimm@...ts.linux.dev>,
<linux-fsdevel@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <kernel-team@...a.com>, <dave@...olabs.net>,
<jonathan.cameron@...wei.com>, <dave.jiang@...el.com>,
<alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
<ira.weiny@...el.com>, <dan.j.williams@...el.com>, <willy@...radead.org>,
<jack@...e.cz>, <terry.bowman@....com>, <john@...alactic.com>
Subject: Re: [PATCH 8/9] cxl/core: Add dax_kmem_region and sysram_region
drivers
On 1/29/2026 3:04 PM, Gregory Price wrote:
> In the current kmem driver binding process, the only way for users
> to define hotplug policy is via a build-time option, or by not
> onlining memory by default and setting each individual memory block
> online after hotplug occurs. We can solve this with a configuration
> step between region-probe and dax-probe.
>
> Add the infrastructure for a two-stage driver binding for kmem-mode
> dax regions. The cxl_dax_kmem_region driver probes cxl_sysram_region
> devices and creates cxl_dax_region with dax_driver=kmem.
>
> This creates an interposition step where users can configure policy.
>
> Device hierarchy:
> region0 -> sysram_region0 -> dax_region0 -> dax0.0
This technically comes up in the devdax_region driver patch first, but I noticed it here
so this is where I'm putting it:
I like the idea here, but the implementation is all off. Firstly, devm_cxl_add_sysram_region()
is never called outside of sysram_region_driver::probe(), so I'm not sure how they ever get
added to the system (same with devdax regions).
Second, there's this weird pattern of adding sub-region (sysram, devdax, etc.) devices being added
inside of the sub-region driver probe. I would expect the devices are added then the probe function
is called. What I think should be going on here (and correct me if I'm wrong) is:
1. a cxl_region device is added to the system
2. cxl_region::probe() is called on said device (one in cxl/core/region.c)
3. Said probe function figures out the device is a dax_region or whatever else and creates that type of region device
(i.e. cxl_region::probe() -> device_add(&cxl_sysram_device))
4. if the device's dax driver type is DAXDRV_DEVICE_TYPE it gets sent to the daxdev_region driver
5a. if the device's dax driver type is DAXDRV_KMEM_TYPE it gets sent to the sysram_region driver which holds it until
the online_type is set
5b. Once the online_type is set, the device is forwarded to the dax_kmem_region driver? Not sure on this part
What seems to be happening is that the cxl_region is added, all of these region drivers try
to bind to it since they all use the same device id (CXL_DEVICE_REGION) and the correct one is
figured out by magic? I'm somewhat confused at this point :/.
>
> The sysram_region device exposes a sysfs 'online_type' attribute
> that allows users to configure the memory online type before the
> underlying dax_region is created and memory is hotplugged.
>
> sysram_region0/online_type:
> invalid: not configured, blocks probe
> offline: memory will not be onlined automatically
> online: memory will be onlined in ZONE_NORMAL
> online_movable: memory will be onlined in ZONE_MMOVABLE
>
> The device initializes with online_type=invalid which prevents the
> cxl_dax_kmem_region driver from binding until the user explicitly
> configures a valid online_type.
>
> This enables a two-step binding process:
> echo region0 > cxl_sysram_region/bind
> echo online_movable > sysram_region0/online_type
> echo sysram_region0 > cxl_dax_kmem_region/bind
>
> Signed-off-by: Gregory Price <gourry@...rry.net>
> ---
> Documentation/ABI/testing/sysfs-bus-cxl | 21 +++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/core.h | 6 +
> drivers/cxl/core/dax_region.c | 50 +++++++
> drivers/cxl/core/port.c | 2 +
> drivers/cxl/core/region.c | 14 ++
> drivers/cxl/core/sysram_region.c | 180 ++++++++++++++++++++++++
> drivers/cxl/cxl.h | 25 ++++
> 8 files changed, 299 insertions(+)
> create mode 100644 drivers/cxl/core/sysram_region.c
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index c80a1b5a03db..a051cb86bdfc 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -624,3 +624,24 @@ Description:
> The count is persistent across power loss and wraps back to 0
> upon overflow. If this file is not present, the device does not
> have the necessary support for dirty tracking.
> +
> +
> +What: /sys/bus/cxl/devices/sysram_regionZ/online_type
> +Date: January, 2026
> +KernelVersion: v7.1
> +Contact: linux-cxl@...r.kernel.org
> +Description:
> + (RW) This attribute allows users to configure the memory online
> + type before the underlying dax_region engages in hotplug.
> +
> + Valid values:
> + 'invalid': Not configured (default). Blocks probe.
This should be removed from the valid values section since it's not a valid value
to write to the attribute. The mention of the default in the paragraph below should
be enough.
> + 'offline': Memory will not be onlined automatically.
> + 'online' : Memory will be onlined in ZONE_NORMAL.
> + 'online_movable': Memory will be onlined in ZONE_MOVABLE.
> +
> + The device initializes with online_type='invalid' which prevents
> + the cxl_dax_kmem_region driver from binding until the user
> + explicitly configures a valid online_type. This enables a
> + two-step binding process that gives users control over memory
> + hotplug policy before memory is added to the system.
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 36f284d7c500..faf662c7d88b 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -18,6 +18,7 @@ cxl_core-y += ras.o
> cxl_core-$(CONFIG_TRACING) += trace.o
> cxl_core-$(CONFIG_CXL_REGION) += region.o
> cxl_core-$(CONFIG_CXL_REGION) += dax_region.o
> +cxl_core-$(CONFIG_CXL_REGION) += sysram_region.o
> cxl_core-$(CONFIG_CXL_REGION) += pmem_region.o
> cxl_core-$(CONFIG_CXL_MCE) += mce.o
> cxl_core-$(CONFIG_CXL_FEATURES) += features.o
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index ea4df8abc2ad..04b32015e9b1 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -26,6 +26,7 @@ extern struct device_attribute dev_attr_delete_region;
> extern struct device_attribute dev_attr_region;
> extern const struct device_type cxl_pmem_region_type;
> extern const struct device_type cxl_dax_region_type;
> +extern const struct device_type cxl_sysram_region_type;
> extern const struct device_type cxl_region_type;
>
> int cxl_decoder_detach(struct cxl_region *cxlr,
> @@ -37,6 +38,7 @@ int cxl_decoder_detach(struct cxl_region *cxlr,
> #define SET_CXL_REGION_ATTR(x) (&dev_attr_##x.attr),
> #define CXL_PMEM_REGION_TYPE(x) (&cxl_pmem_region_type)
> #define CXL_DAX_REGION_TYPE(x) (&cxl_dax_region_type)
> +#define CXL_SYSRAM_REGION_TYPE(x) (&cxl_sysram_region_type)
> int cxl_region_init(void);
> void cxl_region_exit(void);
> int cxl_get_poison_by_endpoint(struct cxl_port *port);
> @@ -44,9 +46,12 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
> u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
> u64 dpa);
> int devm_cxl_add_dax_region(struct cxl_region *cxlr, enum dax_driver_type);
> +int devm_cxl_add_sysram_region(struct cxl_region *cxlr);
> int devm_cxl_add_pmem_region(struct cxl_region *cxlr);
>
> extern struct cxl_driver cxl_devdax_region_driver;
> +extern struct cxl_driver cxl_dax_kmem_region_driver;
> +extern struct cxl_driver cxl_sysram_region_driver;
>
> #else
> static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr,
> @@ -81,6 +86,7 @@ static inline void cxl_region_exit(void)
> #define SET_CXL_REGION_ATTR(x)
> #define CXL_PMEM_REGION_TYPE(x) NULL
> #define CXL_DAX_REGION_TYPE(x) NULL
> +#define CXL_SYSRAM_REGION_TYPE(x) NULL
> #endif
>
> struct cxl_send_command;
> diff --git a/drivers/cxl/core/dax_region.c b/drivers/cxl/core/dax_region.c
> index 391d51e5ec37..a379f5b85e3d 100644
> --- a/drivers/cxl/core/dax_region.c
> +++ b/drivers/cxl/core/dax_region.c
> @@ -127,3 +127,53 @@ struct cxl_driver cxl_devdax_region_driver = {
> .probe = cxl_devdax_region_driver_probe,
> .id = CXL_DEVICE_REGION,
> };
> +
> +static int cxl_dax_kmem_region_driver_probe(struct device *dev)
> +{
> + struct cxl_sysram_region *cxlr_sysram = to_cxl_sysram_region(dev);
> + struct cxl_dax_region *cxlr_dax;
> + struct cxl_region *cxlr;
> + int rc;
> +
> + if (!cxlr_sysram)
> + return -ENODEV;
> +
> + /* Require explicit online_type configuration before binding */
> + if (cxlr_sysram->online_type == -1)
> + return -ENODEV;
> +
> + cxlr = cxlr_sysram->cxlr;
> +
> + cxlr_dax = cxl_dax_region_alloc(cxlr);
> + if (IS_ERR(cxlr_dax))
> + return PTR_ERR(cxlr_dax);
You can use cleanup.h here to remove the goto's (I think). Following should work:
#DEFINE_FREE(cxlr_dax_region_put, struct cxl_dax_region *, if (!IS_ERR_OR_NULL(_T)) put_device(&cxlr_dax->dev))
static int cxl_dax_kmem_region_driver_probe(struct device *dev)
{
...
struct cxl_dax_region *cxlr_dax __free(cxlr_dax_region_put) = cxl_dax_region_alloc(cxlr);
if (IS_ERR(cxlr_dax))
return PTR_ERR(cxlr_dax);
...
rc = dev_set_name(&cxlr_dax->dev, "dax_region%d", cxlr->id);
if (rc)
return rc;
rc = device_add(&cxlr_dax->dev);
if (rc)
return rc;
dev_dbg(dev, "%s: register %s\n", dev_name(dev), dev_name(&cxlr_dax->dev));
return devm_add_action_or_reset(dev, cxlr_dax_unregister, no_free_ptr(cxlr_dax));
}
> +
> + /* Inherit online_type from parent sysram_region */
> + cxlr_dax->online_type = cxlr_sysram->online_type;
> + cxlr_dax->dax_driver = DAXDRV_KMEM_TYPE;
> +
> + /* Parent is the sysram_region device */
> + cxlr_dax->dev.parent = dev;
> +
> + rc = dev_set_name(&cxlr_dax->dev, "dax_region%d", cxlr->id);
> + if (rc)
> + goto err;
> +
> + rc = device_add(&cxlr_dax->dev);
> + if (rc)
> + goto err;
> +
> + dev_dbg(dev, "%s: register %s\n", dev_name(dev),
> + dev_name(&cxlr_dax->dev));
> +
> + return devm_add_action_or_reset(dev, cxlr_dax_unregister, cxlr_dax);
> +err:
> + put_device(&cxlr_dax->dev);
> + return rc;
> +}
> +
> +struct cxl_driver cxl_dax_kmem_region_driver = {
> + .name = "cxl_dax_kmem_region",
> + .probe = cxl_dax_kmem_region_driver_probe,
> + .id = CXL_DEVICE_SYSRAM_REGION,
> +};
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 3310dbfae9d6..dc7262a5efd6 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -66,6 +66,8 @@ static int cxl_device_id(const struct device *dev)
> return CXL_DEVICE_PMEM_REGION;
> if (dev->type == CXL_DAX_REGION_TYPE())
> return CXL_DEVICE_DAX_REGION;
> + if (dev->type == CXL_SYSRAM_REGION_TYPE())
> + return CXL_DEVICE_SYSRAM_REGION;
> if (is_cxl_port(dev)) {
> if (is_cxl_root(to_cxl_port(dev)))
> return CXL_DEVICE_ROOT;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 6200ca1cc2dd..8bef91dc726c 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3734,8 +3734,20 @@ int cxl_region_init(void)
> if (rc)
> goto err_dax;
>
> + rc = cxl_driver_register(&cxl_sysram_region_driver);
> + if (rc)
> + goto err_sysram;
> +
> + rc = cxl_driver_register(&cxl_dax_kmem_region_driver);
> + if (rc)
> + goto err_dax_kmem;
> +
> return 0;
>
> +err_dax_kmem:
> + cxl_driver_unregister(&cxl_sysram_region_driver);
> +err_sysram:
> + cxl_driver_unregister(&cxl_devdax_region_driver);
> err_dax:
> cxl_driver_unregister(&cxl_region_driver);
> return rc;
> @@ -3743,6 +3755,8 @@ int cxl_region_init(void)
>
> void cxl_region_exit(void)
> {
> + cxl_driver_unregister(&cxl_dax_kmem_region_driver);
> + cxl_driver_unregister(&cxl_sysram_region_driver);
> cxl_driver_unregister(&cxl_devdax_region_driver);
> cxl_driver_unregister(&cxl_region_driver);
> }
> diff --git a/drivers/cxl/core/sysram_region.c b/drivers/cxl/core/sysram_region.c
> new file mode 100644
> index 000000000000..5665db238d0f
> --- /dev/null
> +++ b/drivers/cxl/core/sysram_region.c
> @@ -0,0 +1,180 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2026 Meta Platforms, Inc. All rights reserved. */
> +/*
> + * CXL Sysram Region - Intermediate device for kmem hotplug configuration
> + *
> + * This provides an intermediate device between cxl_region and cxl_dax_region
> + * that allows users to configure memory hotplug parameters (like online_type)
> + * before the underlying dax_region is created and memory is hotplugged.
> + */
> +
> +#include <linux/memory_hotplug.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <cxlmem.h>
> +#include <cxl.h>
> +#include "core.h"
> +
> +static void cxl_sysram_region_release(struct device *dev)
> +{
> + struct cxl_sysram_region *cxlr_sysram = to_cxl_sysram_region(dev);
> +
> + kfree(cxlr_sysram);
> +}
> +
> +static ssize_t online_type_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cxl_sysram_region *cxlr_sysram = to_cxl_sysram_region(dev);
> +
> + switch (cxlr_sysram->online_type) {
> + case MMOP_OFFLINE:
> + return sysfs_emit(buf, "offline\n");
> + case MMOP_ONLINE:
> + return sysfs_emit(buf, "online\n");
> + case MMOP_ONLINE_MOVABLE:
> + return sysfs_emit(buf, "online_movable\n");
> + default:
> + return sysfs_emit(buf, "invalid\n");
> + }
> +}
> +
> +static ssize_t online_type_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct cxl_sysram_region *cxlr_sysram = to_cxl_sysram_region(dev);
> +
> + if (sysfs_streq(buf, "offline"))
> + cxlr_sysram->online_type = MMOP_OFFLINE;
> + else if (sysfs_streq(buf, "online"))
> + cxlr_sysram->online_type = MMOP_ONLINE;
> + else if (sysfs_streq(buf, "online_movable"))
> + cxlr_sysram->online_type = MMOP_ONLINE_MOVABLE;
> + else
> + return -EINVAL;
> +
> + return len;
> +}
> +
> +static DEVICE_ATTR_RW(online_type);
> +
> +static struct attribute *cxl_sysram_region_attrs[] = {
> + &dev_attr_online_type.attr,
> + NULL,
> +};
> +
> +static const struct attribute_group cxl_sysram_region_attribute_group = {
> + .attrs = cxl_sysram_region_attrs,
> +};
> +
> +static const struct attribute_group *cxl_sysram_region_attribute_groups[] = {
> + &cxl_base_attribute_group,
> + &cxl_sysram_region_attribute_group,
> + NULL,
> +};
> +
> +const struct device_type cxl_sysram_region_type = {
> + .name = "cxl_sysram_region",
> + .release = cxl_sysram_region_release,
> + .groups = cxl_sysram_region_attribute_groups,
> +};
> +
> +static bool is_cxl_sysram_region(struct device *dev)
> +{
> + return dev->type == &cxl_sysram_region_type;
> +}
> +
> +struct cxl_sysram_region *to_cxl_sysram_region(struct device *dev)
> +{
> + if (dev_WARN_ONCE(dev, !is_cxl_sysram_region(dev),
> + "not a cxl_sysram_region device\n"))
> + return NULL;
> + return container_of(dev, struct cxl_sysram_region, dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(to_cxl_sysram_region, "CXL");
> +
> +static struct lock_class_key cxl_sysram_region_key;
> +
> +static struct cxl_sysram_region *cxl_sysram_region_alloc(struct cxl_region *cxlr)
> +{
> + struct cxl_region_params *p = &cxlr->params;
> + struct cxl_sysram_region *cxlr_sysram;
> + struct device *dev;
> +
> + guard(rwsem_read)(&cxl_rwsem.region);
> + if (p->state != CXL_CONFIG_COMMIT)
> + return ERR_PTR(-ENXIO);
> +
> + cxlr_sysram = kzalloc(sizeof(*cxlr_sysram), GFP_KERNEL);
> + if (!cxlr_sysram)
> + return ERR_PTR(-ENOMEM);
> +
> + cxlr_sysram->hpa_range.start = p->res->start;
> + cxlr_sysram->hpa_range.end = p->res->end;
> + cxlr_sysram->online_type = -1; /* Require explicit configuration */
> +
> + dev = &cxlr_sysram->dev;
> + cxlr_sysram->cxlr = cxlr;
> + device_initialize(dev);
> + lockdep_set_class(&dev->mutex, &cxl_sysram_region_key);
> + device_set_pm_not_required(dev);
> + dev->parent = &cxlr->dev;
> + dev->bus = &cxl_bus_type;
> + dev->type = &cxl_sysram_region_type;
> +
> + return cxlr_sysram;
> +}
> +
> +static void cxlr_sysram_unregister(void *_cxlr_sysram)
> +{
> + struct cxl_sysram_region *cxlr_sysram = _cxlr_sysram;
> +
> + device_unregister(&cxlr_sysram->dev);
> +}
> +
> +int devm_cxl_add_sysram_region(struct cxl_region *cxlr)
> +{
> + struct cxl_sysram_region *cxlr_sysram;
> + struct device *dev;
> + int rc;
> +
> + cxlr_sysram = cxl_sysram_region_alloc(cxlr);
> + if (IS_ERR(cxlr_sysram))
> + return PTR_ERR(cxlr_sysram);
Same thing as above
Thanks,
Ben
> +
> + dev = &cxlr_sysram->dev;
> + rc = dev_set_name(dev, "sysram_region%d", cxlr->id);
> + if (rc)
> + goto err;
> +
> + rc = device_add(dev);
> + if (rc)
> + goto err;
> +
> + dev_dbg(&cxlr->dev, "%s: register %s\n", dev_name(dev->parent),
> + dev_name(dev));
> +
> + return devm_add_action_or_reset(&cxlr->dev, cxlr_sysram_unregister,
> + cxlr_sysram);
> +err:
> + put_device(dev);
> + return rc;
> +}
> +
> +static int cxl_sysram_region_driver_probe(struct device *dev)
> +{
> + struct cxl_region *cxlr = to_cxl_region(dev);
> +
> + /* Only handle RAM regions */
> + if (cxlr->mode != CXL_PARTMODE_RAM)
> + return -ENODEV;
> +
> + return devm_cxl_add_sysram_region(cxlr);
> +}
> +
> +struct cxl_driver cxl_sysram_region_driver = {
> + .name = "cxl_sysram_region",
> + .probe = cxl_sysram_region_driver_probe,
> + .id = CXL_DEVICE_REGION,
> +};
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 674d5f870c70..1544c27e9c89 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -596,6 +596,25 @@ struct cxl_dax_region {
> enum dax_driver_type dax_driver;
> };
>
> +/**
> + * struct cxl_sysram_region - CXL RAM region for system memory hotplug
> + * @dev: device for this sysram_region
> + * @cxlr: parent cxl_region
> + * @hpa_range: Host physical address range for the region
> + * @online_type: Memory online type (MMOP_* 0-3, or -1 if not configured)
> + *
> + * Intermediate device that allows configuration of memory hotplug
> + * parameters before the underlying dax_region is created. The device
> + * starts with online_type=-1 which prevents the cxl_dax_kmem_region
> + * driver from binding until the user explicitly sets online_type.
> + */
> +struct cxl_sysram_region {
> + struct device dev;
> + struct cxl_region *cxlr;
> + struct range hpa_range;
> + int online_type;
> +};
> +
> /**
> * struct cxl_port - logical collection of upstream port devices and
> * downstream port devices to construct a CXL memory
> @@ -890,6 +909,7 @@ void cxl_driver_unregister(struct cxl_driver *cxl_drv);
> #define CXL_DEVICE_PMEM_REGION 7
> #define CXL_DEVICE_DAX_REGION 8
> #define CXL_DEVICE_PMU 9
> +#define CXL_DEVICE_SYSRAM_REGION 10
>
> #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
> #define CXL_MODALIAS_FMT "cxl:t%d"
> @@ -907,6 +927,7 @@ bool is_cxl_pmem_region(struct device *dev);
> struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
> int cxl_add_to_region(struct cxl_endpoint_decoder *cxled);
> struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
> +struct cxl_sysram_region *to_cxl_sysram_region(struct device *dev);
> u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa);
> #else
> static inline bool is_cxl_pmem_region(struct device *dev)
> @@ -925,6 +946,10 @@ static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
> {
> return NULL;
> }
> +static inline struct cxl_sysram_region *to_cxl_sysram_region(struct device *dev)
> +{
> + return NULL;
> +}
> static inline u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint,
> u64 spa)
> {
Powered by blists - more mailing lists