[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEqrY3fEWGKl8rf2@pollux>
Date: Thu, 12 Jun 2025 12:26:43 +0200
From: Danilo Krummrich <dakr@...nel.org>
To: Benno Lossin <lossin@...nel.org>
Cc: gregkh@...uxfoundation.org, rafael@...nel.org, ojeda@...nel.org,
alex.gaynor@...il.com, boqun.feng@...il.com, gary@...yguo.net,
bjorn3_gh@...tonmail.com, benno.lossin@...ton.me,
a.hindborg@...nel.org, aliceryhl@...gle.com, tmgross@...ch.edu,
chrisi.schrefl@...il.com, rust-for-linux@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
On Thu, Jun 12, 2025 at 10:13:29AM +0200, Benno Lossin wrote:
> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> > In Devres::drop() we first remove the devres action and then drop the
> > wrapped device resource.
> >
> > The design goal is to give the owner of a Devres object control over when
> > the device resource is dropped, but limit the overall scope to the
> > corresponding device being bound to a driver.
> >
> > However, there's a race that was introduced with commit 8ff656643d30
> > ("rust: devres: remove action in `Devres::drop`"), but also has been
> > (partially) present from the initial version on.
> >
> > In Devres::drop(), the devres action is removed successfully and
> > subsequently the destructor of the wrapped device resource runs.
> > However, there is no guarantee that the destructor of the wrapped device
> > resource completes before the driver core is done unbinding the
> > corresponding device.
> >
> > If in Devres::drop(), the devres action can't be removed, it means that
> > the devres callback has been executed already, or is still running
> > concurrently. In case of the latter, either Devres::drop() wins revoking
> > the Revocable or the devres callback wins revoking the Revocable. If
> > Devres::drop() wins, we (again) have no guarantee that the destructor of
> > the wrapped device resource completes before the driver core is done
> > unbinding the corresponding device.
>
> I don't understand the exact sequence of events here. Here is what I got
> from your explanation:
>
> * the driver created a `Devres<T>` associated to their device.
> * their physical device gets disconnected and thus the driver core
> starts unbinding the device.
> * simultaneously, the driver drops the `Devres<T>` (eg because the
> driver initiated the physical removal)
> * now `devres_callback` is being called from both `Devres::Drop` (which
> calls `Devres::remove_action`) and from the driver core.
> * they both call `inner.data.revoke()`, but only one wins, in our
> example `Devres::drop`.
> * but now the driver core has finished running `devres_callback` and
> finalizes unbinding the device, even though the `Devres` still exists
> though is almost done being dropped.
Your "almost done being dropped" is close, actually Devres::drop() may or may
not be done calling Revocable::revoke(), i.e. drop_in_place() of the data.
CPU0 CPU1
Devres::drop() { devres_callback() {
self.data.revoke() { this.data.revoke() {
is_available.swap() == true
is_available.swap == false
}
}
// [...]
// driver fully unbound
drop_in_place() {
pci_iounmap()
pci_release_region()
}
}
}
This means that we have to ensure that the revoke() in Devres::drop() is
completed before devres_callback() completes, in case they race.
> I don't see a race here. Also the `dev: ARef<Device>` should keep the
> device alive until the `Devres` is dropped, no?
Yes, the device reference is fine.
Powered by blists - more mailing lists