[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYq_zpNAgNgJY_P8@gourry-fedora-PF4VCD3F>
Date: Tue, 10 Feb 2026 00:19:10 -0500
From: Gregory Price <gourry@...rry.net>
To: Dan Williams <dan.j.williams@...el.com>
Cc: dave.jiang@...el.com, linux-cxl@...r.kernel.org,
linux-kernel@...r.kernel.org,
Smita.KoralahalliChannabasappa@....com, alison.schofield@...el.com,
terry.bowman@....com, alejandro.lucero-palau@....com,
linux-pci@...r.kernel.org, Jonathan.Cameron@...wei.com,
Ben Cheatham <benjamin.cheatham@....com>,
Alejandro Lucero <alucerop@....com>
Subject: Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for
CXL-dependent operation
On Mon, Dec 15, 2025 at 04:56:16PM -0800, Dan Williams wrote:
> @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> {
> int rc;
>
> + /*
> + * If @attach is provided fail if the driver is not attached upon
> + * return. Note that failure here could be the result of a race to
> + * teardown the CXL port topology. I.e. cxl_mem_probe() could have
> + * succeeded and then cxl_mem unbound before the lock is acquired.
> + */
> + guard(device)(&cxlmd->dev);
> + if (cxlmd->attach && !cxlmd->dev.driver) {
> + cxl_memdev_unregister(cxlmd);
> + return ERR_PTR(-ENXIO);
> + }
> +
> rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
> cxlmd);
> if (rc)
This caused a deadlock when trying to unbind cxl_pci and bind a custom
driver.
The followwing analysis was produced by Chris Mason's kreview scripts,
but it did resolve my deadlock, and the analysis seems pretty straight
forward.
Although it was likely caused by a qemu quirk I was dealing with at the
same time (see the note about topology failing to enumerate).
~Gregory
---
cxl_memdev_autoremove() takes device_lock(&cxlmd->dev) via guard(device)
and then calls cxl_memdev_unregister() when the attach callback was
provided but cxl_mem_probe() failed to bind.
cxl_memdev_unregister() calls cdev_device_del() → device_del() →
bus_remove_device() → device_release_driver() which also takes
device_lock(), deadlocking the calling thread.
This path is reached when a driver uses the @attach parameter to
devm_cxl_add_memdev() and the CXL topology fails to enumerate (e.g.
DVSEC range registers decode outside platform-defined CXL ranges,
causing the endpoint port probe to fail).
Fix by using scoped_guard() and breaking out of the guard scope before
calling cxl_memdev_unregister(), so device_lock() is released first.
It suggested:
---
static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
{
int rc;
/*
* If @attach is provided fail if the driver is not attached upon
* return. Note that failure here could be the result of a race to
* teardown the CXL port topology. I.e. cxl_mem_probe() could have
* succeeded and then cxl_mem unbound before the lock is acquired.
*
* Check under device_lock but unregister outside of it:
* cxl_memdev_unregister() calls cdev_device_del() → device_del()
* → device_release_driver() which also takes device_lock().
*/
scoped_guard(device, &cxlmd->dev) {
if (cxlmd->attach && !cxlmd->dev.driver) {
/* Drop lock before unregister to avoid deadlock */
break;
}
rc = devm_add_action_or_reset(cxlmd->cxlds->dev,
cxl_memdev_unregister, cxlmd);
if (rc)
return ERR_PTR(rc);
return cxlmd;
}
/* Reached only when attach failed — lock is released */
cxl_memdev_unregister(cxlmd);
return ERR_PTR(-ENXIO);
}
Powered by blists - more mailing lists