lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2025101714-fiction-reprocess-9368@gregkh>
Date: Fri, 17 Oct 2025 09:41:21 +0200
From: Greg KH <gregkh@...uxfoundation.org>
To: Deepak Sharma <deepak.sharma.472935@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-kernel-mentees@...ts.linux.dev,
	syzbot+6c905ab800f20cf4086c@...kaller.appspotmail.com
Subject: Re: [PATCH] drivers: core: Fix synchronization of removal of device
 with rpm work

On Wed, Sep 17, 2025 at 08:39:55AM +0530, Deepak Sharma wrote:
> Syzbot reports a use-after-free at `rpm_suspend`, while the free
> occurs at the `usb_disconnect`
> 
> All line numbers references will be for commit ID
> d69eb204c255c35abd9e8cb621484e8074c75eaa

Which is 6.17-rc5?

Please always include the full commit information when referencing git
ids.  This would be:
	d69eb204c255 ("Merge tag 'net-6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")

Which is an odd point in our tree :)

> This points to a possible synchronization issue. In `usb_disconnect`
> there's a call to `pm_runtime_barrier` but it does nothing more than
> acting as a sort of "flush" (while cancelling what's the pending
> rpm actions not started yet). There does not seem to be any increase
> in device usage count either in this stacktrace after this stacktrace

How is syzbot triggering any of this?  How is it disconnecting a device,
is this through the gadget api or something else?

> Then we have an eventual call to `device_del`, which further leads
> to a call to `device_pm_remove`. No code synchronizing in any way
> so far with the PM system after that `pm_runtime_barrier`
> 
> Let's say now that the timer expiration queued work for `rpm_suspend`
> executed in this period of absent synchronization. We can create few
> interesting situations here, I will address one
> 
> Let's say that we unlock the `dev->power.lock` at `rpm_suspend`
> work at `drivers/base/power/runtime.c:723` and then the code
> `device_pm_remove` proceeds as normal clearing up the device.
> Any further calls are not going to cancel the tasks we have pending
> and since the lock has been given up, we will proceed, and end up
> deleting the device too, which will lead to a use-after-free
> as observed.
> 
> So at the device removal, we could add a `pm_runtime_forbid`,
> followed by a `pm_runtime_barrier`. This leads to the completion of
> any pending work and forbids any other new work to be added.
> 
> Once we return, we can do `device_pm_remove`. `pm_runtime_forbid`
> does not seem to influence the behavior of `device_pm_remove`
> (tho it does lead to a call to `pm_runtime_get_noresume()` which
> touches the device usage count, but it would still work the same)
> 
> Reported-by: syzbot+6c905ab800f20cf4086c@...kaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=6c905ab800f20cf4086c
> Signed-off-by: Deepak Sharma <deepak.sharma.472935@...il.com>
> ---
>  drivers/base/core.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index d22d6b23e758..616fd02d18ed 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3876,7 +3876,13 @@ void device_del(struct device *dev)
>  	device_remove_file(dev, &dev_attr_uevent);
>  	device_remove_attrs(dev);
>  	bus_remove_device(dev);
> +	/* We need to forbid and then proceed with a barrier here,
> +	 * so that any pending work is flushed 
> +	*/

Trailing whitespace which checkpatch should have caught :(

Also odd comment style.

And you don't document what type of barrier or what type of pending work
you are flushing.

> +	pm_runtime_forbid(dev);
> +	pm_runtime_barrier(dev);
>  	device_pm_remove(dev);
> +	pm_runtime_allow(dev);

Why are you allowing this to happen again?  The device is going away, it
should be stopped by now as per the bus removal.

This all feels very fragile.

thanks,

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ