lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260129103850.GA3317328@killaraus>
Date: Thu, 29 Jan 2026 12:38:50 +0200
From: Laurent Pinchart <laurent.pinchart@...asonboard.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Bartosz Golaszewski <bartosz.golaszewski@....qualcomm.com>,
	Johan Hovold <johan@...nel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Danilo Krummrich <dakr@...nel.org>,
	"Rafael J . Wysocki" <rafael@...nel.org>,
	Tzung-Bi Shih <tzungbi@...nel.org>,
	Linus Walleij <linusw@...nel.org>, Jonathan Corbet <corbet@....net>,
	Shuah Khan <shuah@...nel.org>,
	Wolfram Sang <wsa+renesas@...g-engineering.com>,
	Simona Vetter <simona.vetter@...ll.ch>,
	Dan Williams <dan.j.williams@...el.com>, linux-doc@...r.kernel.org,
	linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org,
	Bartosz Golaszewski <brgl@...nel.org>
Subject: Re: [PATCH 0/3] Revert "revocable: Revocable resource management"

On Wed, Jan 28, 2026 at 09:23:22PM -0400, Jason Gunthorpe wrote:
> On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote:
> > > The latter already have robust schemes to help the driver shutdown and
> > > end the concurrent operations. ie cancel_work_sync(),
> > > del_timer_sync(), free_irq(), and *notifier_unregister().
> > 
> > One a side note, devm_request_irq() is another of the devm_* helpers
> > that cause race conditions, as interrupt handlers can run right after
> > .remove() returns, which drivers will most likely not handle correctly.
> 
> Yes! You *cannot* intermix devm and non-devm approaches without
> creating very subtle bugs exactly like this. If your subsystem does
> not provide a "devm register" helper its drivers shouldn't use devm.

I'd relax that rule a bit. There are resources that drivers must never,
ever access after .remove(), such as MMIO. Using devm_ioremap* should
therefore be safe in all cases.

> > 1. At the beginning of .remove(), flag that the device is being removed.
> > 
> > 2. At the entry point of all fops, check the flag and return with an
> >    error if set. This prevents *new* fops from being entered once
> >    .remove() has started.
> > 
> > 3. At the entry point of all fops, flag that a fop is in progress (with
> >    a counter).
> > 
> > 4. In .remove(), wake up all threads sleeping in fops.
> > 
> > 5. At the exit point of all fops, decrease the fop-in-progress counter.
> > 
> > 6. In .remove(), wait until the fop-in-progress counter reaches 0. At
> >    that point, it is guaranteed that all fops will have completed and
> >    that no new fop can run.
> > 
> > 7. Complete .remove(), freeing resources.
> 
> This is is basically just open coding a rwsem.. :)

Yes, and that's why I wrote that my explanation was conceptual :-) I
think it's important for developers (at least the ones developing
subsystem integration for this, if not all driver authors) to understand
the concepts behind it. Then we can optimize performance with the right
kernel primitives. If we start the API documentation by talking about
SRCU, people will be lost.

> SRCU should be faster than this, IIRC it has less cache line bouncing.
> 
> But sure, it is all easy once you figure out how to give the fops shim
> some place to store all this state since people would not agree to
> make this a universal cost to all fops.

I didn't see any push back against Dan's proposal to store that
information in struct cdev, did I miss something ? 

> > > Yes there are other cases, and certainly I've commonly seen cases of
> > > drivers reaching into other drivers, and subsystem non-file ops, but
> > > these cases usualy have other more fundamental issues with remove
> > > races :(
> > 
> > Drivers using resources provided by other drivers is a more complex
> > problem. I'm thinking about a driver acquiring a GPIO in .probe(), and
> > the GPIO provider disappearing at a random point of time. Or a clock, or
> > a regulator. These issues are more rare if we ignore unbinding drivers
> > forcefully through sysfs, but they can still occur, so we should try to
> > find a solution here too (and the sysfs unbind issue is also worth
> > fixing). There, unlike in the cdev case, some sort of revocable API
> > could make sense.
> 
> IMHO the sanest answer is to force the depending driver to unplug
> before allowing the dependend driver to remove. Isn't that what the
> dev link stuff does? IDK it has been forever now since I've done
> embedded stuff..

I think it's the simplest answer, yes. It's a bit like using a canon to
kill a fly though, but all other solutions would I think be much more
complex.

-- 
Regards,

Laurent Pinchart

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ