[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260129105634.GC3317328@killaraus>
Date: Thu, 29 Jan 2026 12:56:34 +0200
From: Laurent Pinchart <laurent.pinchart@...asonboard.com>
To: Bartosz Golaszewski <brgl@...nel.org>
Cc: Johan Hovold <johan@...nel.org>,
Bartosz Golaszewski <bartosz.golaszewski@....qualcomm.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Danilo Krummrich <dakr@...nel.org>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Tzung-Bi Shih <tzungbi@...nel.org>,
Linus Walleij <linusw@...nel.org>, Jonathan Corbet <corbet@....net>,
Shuah Khan <shuah@...nel.org>,
Wolfram Sang <wsa+renesas@...g-engineering.com>,
Simona Vetter <simona.vetter@...ll.ch>,
Dan Williams <dan.j.williams@...el.com>,
Jason Gunthorpe <jgg@...dia.com>, linux-doc@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] Revert "revocable: Revocable resource management"
On Thu, Jan 29, 2026 at 10:11:46AM +0100, Bartosz Golaszewski wrote:
> On Wed, Jan 28, 2026 at 4:48 PM Johan Hovold <johan@...nel.org> wrote:
> > On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote:
> > > On Mon, Jan 26, 2026 at 2:50 PM Johan Hovold <johan@...nel.org> wrote:
> >
> > > > It's certainly possible to handle the chardev unplug issue without
> > > > revocable as several subsystems already do. All you need is a refcount,
> > > > a lock and a flag.
> > > >
> > > > It may be possible to provide a generic solutions at the chardev level
> > > > or some kind of helper implementation (similar to revocable) for
> > > > subsystems to use directly.
> > >
> > > This echoes the heated exchange I recently had with Johan elsewhere so
> > > I would like to chime in and use the wider forum of driver core
> > > maintainers to settle an important question. It seems there are two
> > > camps in this discussion: one whose perception of the problem is
> > > limited to character devices being referenced from user-space at the
> > > time of the driver unbind (favoring fixing the issues at the vfs
> > > level) and another extending the problem to any driver unbinding where
> > > we cannot ensure a proper ordering of the teardown (for whatever
> > > reason: fw_devlink=off, helper auxiliary devices acting as
> > > intermediates, or even user-space unbinding a driver manually with
> > > bus-level sysfs attributes) leaving consumers of resources exposed by
> > > providers that are gone with dangling references (focusing the
> > > solutions on the subsystem level).
> >
> > What I've been trying to get across is that the chardev hot-unplug issue
> > is real and needs to be fixed where it still exists, while the manual
> > unbinding of drivers by root is a corner case which does not need to be
> > addressed at *any* cost.
> >
> > If addressing the latter by wrapping every resource access in code that
> > adds enough runtime overhead and makes drivers harder to write and
> > maintain it *may* not be worth it and we should instead explore
> > alternatives.
>
> Alright, so we *do* agree at least on some parts. :)
>
> I agree that any such change should not affect drivers. If you look at
> the GPIO changes I did or the proposed nvmem rework - it never touched
> drivers, only the subsystem level code. The latter especially is
> really tiny, in fact:
>
> drivers/nvmem/core.c | 172 +++++++++++++++++++++++---------------
> drivers/nvmem/internals.h | 17 +++-
>
> is all you need to make it not crash in the situations I described
> under that series. Runtime overhead in read-sections with SRCU or
> read-write semaphores is negligible and typically we only have to
> write on driver unbind. So that "wrapping every resource access"
> sounds scary but really is not.
>
> GPIO work was bigger but it addressed way more synchronization issues
> than just supplier unbinding.
>
> For I2C both the problem is different (subsystem waiting forever for
> consumers to release all references) and the culprit: memory used to
> hold the reference-counted struct device is released the supplier
> unbind unconditionally. Unfortunately there's no way around it other
> than to first move it into a separate chunk managed by i2c core.
Isn't there ? Can't the driver-specific data structure be
reference-counted instead of unconditionally freed at unbind time ?
> But
> that's not the synchronization part that leaks into the drivers, just
> the need to move struct device out of struct i2c_adapter.
>
> > This may involve tracking consumers like fw_devlink already does today
> > so that they are unbound before their dependencies are.
>
> During Saravana's talk at LPC we did briefly speak about whether it
> would be possible to enforce devlinks for ALL devices linked in a
> consumer-supplier fashion. I did in fact look into it for a bit on my
> way back and it too would require at least subsystem-level changes
> across all subsystems because you need to add that entry point at the
> time of the resource being requested so it's not a no-cost operation.
> But it is an alternative, yes though it'll require a comparable amount
> of gap-plugging IMO.
I recall at least one driver (omap3isp) having a circular resource
issue. The ISP hardware block has the ability to produce a clock for the
camera sensor, and the camera sensor is a resource acquired by the ISP
driver. It's quite rare, but it happens. I would however not reject a
solution that would solve the 99.99% of the problem without addressing
this.
> > Because in the end, how sound is a model where we allow critical
> > resources to silently go away while a device is still in use (e.g. you
> > won't discover that your emergency shutdown gpio is gone until you
> > actually need it)?
>
> Well, we do allow it at the moment. It doesn't seem like devlink will
> be able to cover 100% of use-cases anytime soon.
We have this issue because designing resource management is hard. The
decision we made not to pay that cost has now turned into a huge
technical debt. There's no easy way around it, it won't be easier to
solve it correctly today than it was years ago. I don't know when we
will be able to fix the issue, but I know it will happen only when we
decide to face the situation and stop with band-aids.
What I think is the biggest issue at the moment is the lack of
motivation/time/money to address this huge, but I'm hopeful because I
trust the technical expertise of the community.
--
Regards,
Laurent Pinchart
Powered by blists - more mailing lists