[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0c2a91f6-e94c-4898-b259-158be655b630@amd.com>
Date: Sat, 20 Dec 2025 07:31:22 +0000
From: Alejandro Lucero Palau <alucerop@....com>
To: Dan Williams <dan.j.williams@...el.com>, dave.jiang@...el.com
Cc: linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
Smita.KoralahalliChannabasappa@....com, alison.schofield@...el.com,
terry.bowman@....com, alejandro.lucero-palau@....com,
linux-pci@...r.kernel.org, Jonathan.Cameron@...wei.com,
Ben Cheatham <benjamin.cheatham@....com>
Subject: Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for
CXL-dependent operation
On 12/16/25 00:56, Dan Williams wrote:
> Unlike the cxl_pci class driver that opportunistically enables memory
> expansion with no other dependent functionality, CXL accelerator drivers
> have distinct PCIe-only and CXL-enhanced operation states. If CXL is
> available some additional coherent memory/cache operations can be enabled,
> otherwise traditional DMA+MMIO over PCIe/CXL.io is a fallback.
This can be misleading. I bet most Type2 drivers will indeed use CXL.io
and traditional DMA along with some special functionality with CXL.mem
and CXL.cache.
>
> Allow for a driver to pass a routine to be called in cxl_mem_probe()
> context. This ability is inspired by and mirrors the semantics of
> faux_device_create(). It allows for the caller to run CXL-topology
> attach-dependent logic on behalf of the caller.
>
> The probe callback runs after the port topology is successfully attached
> for the given memdev.
As discussed at LPC, I would like to reflect here this potential
problem, with port not ready, is mainly due to CXL switches being in the
path to the device, unlike an accelerator directly connected to a CXL
Host Bridge which will have all this sorted out at the time of driver
probing. If there exist other cases, that should be also documented here
or in the future when discovered.
>
> Additionally the presence of @cxlmd->attach indicates that the accelerator
> driver be detached when CXL operation ends. This conceptually makes a CXL
> link loss event mirror a PCIe link loss event which results in triggering
> the ->remove() callback of affected devices+drivers.
Also as discussed at LPC, I do not think this should happen. The
accelerator driver should of course get a notification about this but
not removed. Moreover, should not a link error being reported somehow
through the RAS infrastructure and the driver reacting accordingly?
FWIW, there is a pending work by Vishal Aslot based on Type2 support for
this RAS support what I have discussed with him for being a follow-up
work of Type2.
> A driver can re-attach
> to recover back to PCIe-only operation. Live recovery, i.e. without a
> ->remove()/->probe() cycle, is left as a future consideration.
>
> Cc: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@....com>
> Reviewed-by: Dave Jiang <dave.jiang@...el.com> (✓ DKIM/intel.com)
> Tested-by: Alejandro Lucero <alucerop@....com> (✓ DKIM/amd.com)
> Signed-off-by: Dan Williams <dan.j.williams@...el.com>
> ---
> drivers/cxl/cxlmem.h | 12 ++++++++++--
> drivers/cxl/core/memdev.c | 33 +++++++++++++++++++++++++++++----
> drivers/cxl/mem.c | 20 ++++++++++++++++----
> drivers/cxl/pci.c | 2 +-
> tools/testing/cxl/test/mem.c | 2 +-
> 5 files changed, 57 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 9db31c7993c4..ef202b34e5ea 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -34,6 +34,10 @@
> (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \
> CXLMDEV_RESET_NEEDED_NOT)
>
> +struct cxl_memdev_attach {
> + int (*probe)(struct cxl_memdev *cxlmd);
> +};
> +
> /**
> * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
> * @dev: driver core device object
> @@ -43,6 +47,7 @@
> * @cxl_nvb: coordinate removal of @cxl_nvd if present
> * @cxl_nvd: optional bridge to an nvdimm if the device supports pmem
> * @endpoint: connection to the CXL port topology for this memory device
> + * @attach: creator of this memdev depends on CXL link attach to operate
> * @id: id number of this memdev instance.
> * @depth: endpoint port depth
> * @scrub_cycle: current scrub cycle set for this device
> @@ -59,6 +64,7 @@ struct cxl_memdev {
> struct cxl_nvdimm_bridge *cxl_nvb;
> struct cxl_nvdimm *cxl_nvd;
> struct cxl_port *endpoint;
> + const struct cxl_memdev_attach *attach;
> int id;
> int depth;
> u8 scrub_cycle;
> @@ -95,8 +101,10 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
> return is_cxl_memdev(port->uport_dev);
> }
>
> -struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
> -struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
> +struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> + const struct cxl_memdev_attach *attach);
> +struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> + const struct cxl_memdev_attach *attach);
> int devm_cxl_sanitize_setup_notifier(struct device *host,
> struct cxl_memdev *cxlmd);
> struct cxl_memdev_state;
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 63da2bd4436e..3ab4cd8f19ed 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -641,14 +641,24 @@ static void detach_memdev(struct work_struct *work)
> struct cxl_memdev *cxlmd;
>
> cxlmd = container_of(work, typeof(*cxlmd), detach_work);
> - device_release_driver(&cxlmd->dev);
> +
> + /*
> + * When the creator of @cxlmd sets ->attach it indicates CXL operation
> + * is required. In that case, @cxlmd detach escalates to parent device
> + * detach.
> + */
> + if (cxlmd->attach)
> + device_release_driver(cxlmd->dev.parent);
> + else
> + device_release_driver(&cxlmd->dev);
> put_device(&cxlmd->dev);
> }
>
> static struct lock_class_key cxl_memdev_key;
>
> static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> - const struct file_operations *fops)
> + const struct file_operations *fops,
> + const struct cxl_memdev_attach *attach)
> {
> struct cxl_memdev *cxlmd;
> struct device *dev;
> @@ -664,6 +674,8 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> goto err;
> cxlmd->id = rc;
> cxlmd->depth = -1;
> + cxlmd->attach = attach;
> + cxlmd->endpoint = ERR_PTR(-ENXIO);
>
> dev = &cxlmd->dev;
> device_initialize(dev);
> @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> {
> int rc;
>
> + /*
> + * If @attach is provided fail if the driver is not attached upon
> + * return. Note that failure here could be the result of a race to
> + * teardown the CXL port topology. I.e. cxl_mem_probe() could have
> + * succeeded and then cxl_mem unbound before the lock is acquired.
> + */
> + guard(device)(&cxlmd->dev);
> + if (cxlmd->attach && !cxlmd->dev.driver) {
> + cxl_memdev_unregister(cxlmd);
> + return ERR_PTR(-ENXIO);
> + }
> +
> rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
> cxlmd);
> if (rc)
> @@ -1093,13 +1117,14 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> * Core helper for devm_cxl_add_memdev() that wants to both create a device and
> * assert to the caller that upon return cxl_mem::probe() has been invoked.
> */
> -struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
> +struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> + const struct cxl_memdev_attach *attach)
> {
> struct device *dev;
> int rc;
>
> struct cxl_memdev *cxlmd __free(put_cxlmd) =
> - cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
> + cxl_memdev_alloc(cxlds, &cxl_memdev_fops, attach);
> if (IS_ERR(cxlmd))
> return cxlmd;
>
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 677996c65272..333c366b69e7 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -142,6 +142,12 @@ static int cxl_mem_probe(struct device *dev)
> return rc;
> }
>
> + if (cxlmd->attach) {
> + rc = cxlmd->attach->probe(cxlmd);
> + if (rc)
> + return rc;
> + }
> +
> rc = devm_cxl_memdev_edac_register(cxlmd);
> if (rc)
> dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
> @@ -166,17 +172,23 @@ static int cxl_mem_probe(struct device *dev)
> /**
> * devm_cxl_add_memdev - Add a CXL memory device
> * @cxlds: CXL device state to associate with the memdev
> + * @attach: Caller depends on CXL topology attachment
> *
> * Upon return the device will have had a chance to attach to the
> - * cxl_mem driver, but may fail if the CXL topology is not ready
> - * (hardware CXL link down, or software platform CXL root not attached)
> + * cxl_mem driver, but may fail to attach if the CXL topology is not ready
> + * (hardware CXL link down, or software platform CXL root not attached).
> + *
> + * When @attach is NULL it indicates the caller wants the memdev to remain
> + * registered even if it does not immediately attach to the CXL hierarchy. When
> + * @attach is provided a cxl_mem_probe() failure leads to failure of this routine.
> *
> * The parent of the resulting device and the devm context for allocations is
> * @cxlds->dev.
> */
> -struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds)
> +struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> + const struct cxl_memdev_attach *attach)
> {
> - return __devm_cxl_add_memdev(cxlds);
> + return __devm_cxl_add_memdev(cxlds, attach);
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 1c6fc5334806..549368a9c868 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1006,7 +1006,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> dev_dbg(&pdev->dev, "No CXL Features discovered\n");
>
> - cxlmd = devm_cxl_add_memdev(cxlds);
> + cxlmd = devm_cxl_add_memdev(cxlds, NULL);
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
>
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 8a22b7601627..cb87e8c0e63c 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -1767,7 +1767,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>
> cxl_mock_add_event_logs(&mdata->mes);
>
> - cxlmd = devm_cxl_add_memdev(cxlds);
> + cxlmd = devm_cxl_add_memdev(cxlds, NULL);
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
>
Powered by blists - more mailing lists