linux-kernel - Re: [PATCH] devcoredump: Fix circular locking dependency with devcd->mutex.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e683355a9a9f700d98ae0a057063a975bb11fadc.camel@sipsolutions.net>
Date: Fri, 24 Oct 2025 10:12:19 +0200
From: Johannes Berg <johannes@...solutions.net>
To: Maarten Lankhorst <dev@...khorst.se>, linux-kernel@...r.kernel.org
Cc: intel-xe@...ts.freedesktop.org, Mukesh Ojha <quic_mojha@...cinc.com>, 
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "Rafael J. Wysocki"
 <rafael@...nel.org>, Danilo Krummrich	 <dakr@...nel.org>,
 stable@...r.kernel.org, Matthew Brost <matthew.brost@...el.com>
Subject: Re: [PATCH] devcoredump: Fix circular locking dependency with
 devcd->mutex.

On Wed, 2025-07-23 at 16:24 +0200, Maarten Lankhorst wrote:
> 
> +static void __devcd_del(struct devcd_entry *devcd)
> +{
> +	devcd->deleted = true;
> +	device_del(&devcd->devcd_dev);
> +	put_device(&devcd->devcd_dev);
> +}
> +
>  static void devcd_del(struct work_struct *wk)
>  {
>  	struct devcd_entry *devcd;
> +	bool init_completed;
>  
>  	devcd = container_of(wk, struct devcd_entry, del_wk.work);
>  
> -	device_del(&devcd->devcd_dev);
> -	put_device(&devcd->devcd_dev);
> +	/* devcd->mutex serializes against dev_coredumpm_timeout */
> +	mutex_lock(&devcd->mutex);
> +	init_completed = devcd->init_completed;
> +	mutex_unlock(&devcd->mutex);
> +
> +	if (init_completed)
> +		__devcd_del(devcd);

I'm not sure I understand this completely right now. I think you pull
this out of the mutex because otherwise the unlock could/would be UAF,
right?

But also we have this:

> @@ -151,11 +160,21 @@ static int devcd_free(struct device *dev, void *data)
>  {
>  	struct devcd_entry *devcd = dev_to_devcd(dev);
>  
> +	/*
> +	 * To prevent a race with devcd_data_write(), disable work and
> +	 * complete manually instead.
> +	 *
> +	 * We cannot rely on the return value of
> +	 * disable_delayed_work_sync() here, because it might be in the
> +	 * middle of a cancel_delayed_work + schedule_delayed_work pair.
> +	 *
> +	 * devcd->mutex here guards against multiple parallel invocations
> +	 * of devcd_free().
> +	 */
> +	disable_delayed_work_sync(&devcd->del_wk);
>  	mutex_lock(&devcd->mutex);
> -	if (!devcd->delete_work)
> -		devcd->delete_work = true;
> -
> -	flush_delayed_work(&devcd->del_wk);
> +	if (!devcd->deleted)
> +		__devcd_del(devcd);
>  	mutex_unlock(&devcd->mutex);

^^^^

Which I _think_ is probably OK because devcd_free is only called with an
extra reference held (for each/find device.)

But ... doesn't that then still have unbalanced calls to __devcd_del()
and thus device_del()/put_device()?

CPU 0				CPU 1

dev_coredump_put()		devcd_del()
 -> devcd_free()
   -> locked
     -> !deleted
     -> __devcd_del()
				-> __devcd_del()

no?

johannes