[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2024071227-surname-satirical-1184@gregkh>
Date: Fri, 12 Jul 2024 10:56:48 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Dirk Behme <dirk.behme@...bosch.com>, linux-kernel@...r.kernel.org,
Rafael J Wysocki <rafael@...nel.org>,
Eugeniu Rosca <eugeniu.rosca@...ch.com>,
syzbot+ffa8143439596313a85a@...kaller.appspotmail.com,
Ashish Sangwan <a.sangwan@...sung.com>,
Namjae Jeon <namjae.jeon@...sung.com>, linux-cxl@...r.kernel.org
Subject: Re: [PATCH v2] drivers: core: synchronize really_probe() and
dev_uevent()
On Thu, Jul 11, 2024 at 05:07:21PM -0700, Dan Williams wrote:
> Dirk Behme wrote:
> > Synchronize the dev->driver usage in really_probe() and dev_uevent().
> > These can run in different threads, what can result in the following
> > race condition for dev->driver uninitialization:
>
> This fix introduces an ABBA deadlock scenario via the known antipattern
> of taking the device_lock() within device attributes that are removed
> while the lock is held.
Ugh, yes :(
device attributes should not be taking that lock, don't we have a
different call for an attribute that will be removing itself?
> Lockdep splat below. I previously reported this on a syzbot report
> against nvdimm subsytems with a more complicated splat [1], but this one
> is more straightforward.
>
> Recall that the reason this lockdep report is not widespread is because
> CXL and NVDIMM are among the only subsystems that add lockdep coverage
> to device_lock() with a local key.
>
> [1]: http://lore.kernel.org/667a2ae44c0c0_5be92947e@dwillia2-mobl3.amr.corp.intel.com.notmuch
>
> One potential hack is something like this if it is backstopped with
> synchronization between unregistering drivers from buses relative to
> uevent callbacks for those buses:
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 2b4c0624b704..dfba73ef39af 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2640,6 +2640,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
> static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
> {
> const struct device *dev = kobj_to_dev(kobj);
> + struct device_driver *driver;
> int retval = 0;
>
> /* add device node properties if present */
> @@ -2668,8 +2669,14 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
> if (dev->type && dev->type->name)
> add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
>
> - if (dev->driver)
> - add_uevent_var(env, "DRIVER=%s", dev->driver->name);
> + /*
> + * While it is likely that this races driver detach, it is
> + * unlikely that any driver attached with this device is racing being
> + * freed relative to a uevent for the same device
> + */
> + driver = READ_ONCE(dev->driver);
> + if (driver)
> + add_uevent_var(env, "DRIVER=%s", driver->name);
>
> /* Add common DT information about the device */
> of_device_uevent(dev, env);
>
I'll take this patch for now if you want to also include the removal of
the lock patch that caused your splat.
thanks,
greg k-h
Powered by blists - more mailing lists