[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a6b2ba96-ce22-426b-8277-8121ed77fa55@intel.com>
Date: Tue, 10 Feb 2026 08:05:21 -0700
From: Dave Jiang <dave.jiang@...el.com>
To: Gregory Price <gourry@...rry.net>, Dan Williams <dan.j.williams@...el.com>
Cc: linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
Smita.KoralahalliChannabasappa@....com, alison.schofield@...el.com,
terry.bowman@....com, alejandro.lucero-palau@....com,
linux-pci@...r.kernel.org, Jonathan.Cameron@...wei.com,
Ben Cheatham <benjamin.cheatham@....com>, Alejandro Lucero <alucerop@....com>
Subject: Re: [PATCH v2 6/6] cxl/mem: Introduce cxl_memdev_attach for
CXL-dependent operation
On 2/9/26 10:19 PM, Gregory Price wrote:
> On Mon, Dec 15, 2025 at 04:56:16PM -0800, Dan Williams wrote:
>> @@ -1081,6 +1093,18 @@ static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
>> {
>> int rc;
>>
>> + /*
>> + * If @attach is provided fail if the driver is not attached upon
>> + * return. Note that failure here could be the result of a race to
>> + * teardown the CXL port topology. I.e. cxl_mem_probe() could have
>> + * succeeded and then cxl_mem unbound before the lock is acquired.
>> + */
>> + guard(device)(&cxlmd->dev);
>> + if (cxlmd->attach && !cxlmd->dev.driver) {
>> + cxl_memdev_unregister(cxlmd);
>> + return ERR_PTR(-ENXIO);
>> + }
>> +
>> rc = devm_add_action_or_reset(cxlmd->cxlds->dev, cxl_memdev_unregister,
>> cxlmd);
>> if (rc)
>
>
> This caused a deadlock when trying to unbind cxl_pci and bind a custom
> driver.
>
> The followwing analysis was produced by Chris Mason's kreview scripts,
> but it did resolve my deadlock, and the analysis seems pretty straight
> forward.
The suggested resolution looks reasonable. Can you post a fix patch? Will need to apply it against 7.0-rc. It's too late to get it into the 7.0 merge PR.
DJ
>
> Although it was likely caused by a qemu quirk I was dealing with at the
> same time (see the note about topology failing to enumerate).
>
> ~Gregory
>
> ---
>
> cxl_memdev_autoremove() takes device_lock(&cxlmd->dev) via guard(device)
> and then calls cxl_memdev_unregister() when the attach callback was
> provided but cxl_mem_probe() failed to bind.
>
> cxl_memdev_unregister() calls cdev_device_del() → device_del() →
> bus_remove_device() → device_release_driver() which also takes
> device_lock(), deadlocking the calling thread.
>
> This path is reached when a driver uses the @attach parameter to
> devm_cxl_add_memdev() and the CXL topology fails to enumerate (e.g.
> DVSEC range registers decode outside platform-defined CXL ranges,
> causing the endpoint port probe to fail).
>
> Fix by using scoped_guard() and breaking out of the guard scope before
> calling cxl_memdev_unregister(), so device_lock() is released first.
>
> It suggested:
> ---
>
> static struct cxl_memdev *cxl_memdev_autoremove(struct cxl_memdev *cxlmd)
> {
> int rc;
>
> /*
> * If @attach is provided fail if the driver is not attached upon
> * return. Note that failure here could be the result of a race to
> * teardown the CXL port topology. I.e. cxl_mem_probe() could have
> * succeeded and then cxl_mem unbound before the lock is acquired.
> *
> * Check under device_lock but unregister outside of it:
> * cxl_memdev_unregister() calls cdev_device_del() → device_del()
> * → device_release_driver() which also takes device_lock().
> */
> scoped_guard(device, &cxlmd->dev) {
> if (cxlmd->attach && !cxlmd->dev.driver) {
> /* Drop lock before unregister to avoid deadlock */
> break;
> }
>
> rc = devm_add_action_or_reset(cxlmd->cxlds->dev,
> cxl_memdev_unregister, cxlmd);
> if (rc)
> return ERR_PTR(rc);
>
> return cxlmd;
> }
>
> /* Reached only when attach failed — lock is released */
> cxl_memdev_unregister(cxlmd);
> return ERR_PTR(-ENXIO);
> }
>
Powered by blists - more mailing lists