lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZtWYWuqhqvdWd0Q7@phenom.ffwll.local>
Date: Mon, 2 Sep 2024 12:50:02 +0200
From: Daniel Vetter <daniel.vetter@...ll.ch>
To: Maxime Ripard <mripard@...nel.org>
Cc: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	David Airlie <airlied@...il.com>, Daniel Vetter <daniel@...ll.ch>,
	Laurent Pinchart <laurent.pinchart@...asonboard.com>,
	dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] drm/mipi-dsi: Fix devm unregister & detach

On Mon, Sep 02, 2024 at 11:26:11AM +0200, Maxime Ripard wrote:
> Hi,
> 
> On Wed, Aug 07, 2024 at 03:19:23PM GMT, Tomi Valkeinen wrote:
> > On 25/07/2024 14:28, Maxime Ripard wrote:
> > > On Mon, Jul 15, 2024 at 11:32:34AM GMT, Tomi Valkeinen wrote:
> > > > On 02/07/2024 14:43, Maxime Ripard wrote:
> > > > > Hi Tomi,
> > > > > 
> > > > > On Wed, Jun 26, 2024 at 06:53:40PM GMT, Tomi Valkeinen wrote:
> > > > > > On 26/06/2024 18:07, Maxime Ripard wrote:
> > > > > > > On Wed, Jun 26, 2024 at 12:55:39PM GMT, Tomi Valkeinen wrote:
> > > > > > > > On 26/06/2024 11:49, Maxime Ripard wrote:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > On Wed, Jun 19, 2024 at 12:07:48PM GMT, Tomi Valkeinen wrote:
> > > > > > > > > > From: Tomi Valkeinen <tomi.valkeinen+renesas@...asonboard.com>
> > > > > > > > > > 
> > > > > > > > > > When a bridge driver uses devm_mipi_dsi_device_register_full() or
> > > > > > > > > > devm_mipi_dsi_attach(), the resource management is moved to devres,
> > > > > > > > > > which releases the resource automatically when the bridge driver is
> > > > > > > > > > unbound.
> > > > > > > > > > 
> > > > > > > > > > However, if the DSI host goes away first, the host unregistration code
> > > > > > > > > > will automatically detach and unregister any DSI peripherals, without
> > > > > > > > > > notifying the devres about it. So when the bridge driver later is
> > > > > > > > > > unbound, the resources are released a second time, leading to crash.
> > > > > > > > > 
> > > > > > > > > That's super surprising. mipi_dsi_device_unregister calls
> > > > > > > > > device_unregister, which calls device_del, which in turn calls
> > > > > > > > > devres_release_all.
> > > > > > > > 
> > > > > > > > Hmm, right.
> > > > > > > > 
> > > > > > > > > If that doesn't work like that, then it's what needs to be fixed, and
> > > > > > > > > not worked around in the MIPI-DSI bus.
> > > > > > > > 
> > > > > > > > Well, something causes a crash for both the device register/unregister case
> > > > > > > > and the attach/detach case, and the call stacks and debug prints showed a
> > > > > > > > double unregister/detach...
> > > > > > > > 
> > > > > > > > I need to dig up the board and check again why the devres_release_all() in
> > > > > > > > device_del() doesn't solve this. But I can probably only get back to this in
> > > > > > > > August, so it's perhaps best to ignore this patch for now.
> > > > > > > > 
> > > > > > > > However, the attach/detach case is still valid? I see no devres calls in the
> > > > > > > > detach paths.
> > > > > > > 
> > > > > > > I'm not sure what you mean by the attach/detach case. Do you expect
> > > > > > > device resources allocated in attach to be freed when detach run?
> > > > > > 
> > > > > > Ah, never mind, the devres_release_all() would of course deal with that too.
> > > > > > 
> > > > > > However, I just realized/remembered why it crashes.
> > > > > > 
> > > > > > devm_mipi_dsi_device_register_full() and devm_mipi_dsi_attach() are given a
> > > > > > device which is used for the devres. This device is probably always the
> > > > > > bridge device. So when the bridge device goes away, so do those resources.
> > > > > > 
> > > > > > The mipi_dsi_device_unregister() call deals with a DSI device, which was
> > > > > > created in devm_mipi_dsi_device_register_full(). Unregistering that DSI
> > > > > > device, which does happen when the DSI host is removed, does not affect the
> > > > > > devres of the bridge.
> > > > > > 
> > > > > > So, unloading the DSI host driver causes mipi_dsi_device_unregister() and
> > > > > > mipi_dsi_detach() to be called (as part of mipi_dsi_host_unregister()), and
> > > > > > unloading the bridge driver causes them to be called again via devres.
> > > > > 
> > > > > Sorry, that's one of the things I don't quite get. Both functions are
> > > > > exclusively(?) called from I2C bridges, so the device passed there
> > > > > should be a i2c_client instance, and thus the MIPI-DSI host going away
> > > > > will not remove those i2c devices, only the MIPI-DSI ones, right?
> > > > 
> > > > Yes.
> > > > 
> > > > > So if we remove the host, the MIPI-DSI device will be detached and
> > > > > removed through the path you were explaing with the i2c client lingering
> > > > > around. And if we remove the I2C device, then devm will kick in and will
> > > > > detach and remove the MIPI-DSI device.
> > > > 
> > > > Right.
> > > > 
> > > > > Or is it the other way around? That if you remove the host, the device
> > > > > is properly detached and removed, but there's still the devm actions
> > > > > lingering around in the i2c device with pointers to the mipi_dsi_device
> > > > > that was first created, but since destroyed?
> > > > > 
> > > > > And thus, if the i2c device ever goes away, we get a use-after-free?
> > > > 
> > > > Hmm, I'm not sure I understand what you mean here... Aren't you describing
> > > > the same thing in both of these cases?
> > > > 
> > > > In any case, to expand the description a bit, module unloading is quite
> > > > fragile. I do get a crash if I first unload the i2c bridge module, and only
> > > > then go and unload the other ones in the DRM pipeline. But I think module
> > > > unloading will very easily crash, whatever the DRM drivers being used are,
> > > > so it's not related to this particular issue.
> > > > 
> > > > In my view, the unload sequence that should be supported (for development
> > > > purposes, not for production) is to start the unload from the display
> > > > controller module, which tears down the DRM pipeline, and going from there
> > > > towards the panels/connectors.
> > > > 
> > > > Of course, it would be very nice if the module unloading worked perfectly,
> > > > but afaics fixing all that's related to module unloading would be a
> > > > multi-year project... So, I just want to keep the sequence I described above
> > > > working, which allows using modules while doing driver development.
> > > 
> > > FTR, I'm all for supporting module unloading. The discussion above was
> > > about what is broken exactly, so we can come up with a good solution.
> > 
> > Does that mean that you're ok with the patch, or that something should be
> > improved?
> 
> No, I meant that at the very least the commit log needs to be updated to
> reflect what is actually going on, because at least my understanding of
> it doesn't match what actually happens.
> 
> We want a solution to the problem you're facing, but it's not clear to
> me what the problem is exactly at this point, so it's hard to review a
> solution.

So I haven't looked at the full thing, but I think the proper fix is to
make both detach and unregister cope with being called multiple times. I
think devm_ here is a red herring, the underlying issues is that we can
unregister/detach from two sides:

- when the host dsi goes away
- when individual dsi devices on a given host go away

So there needs to be book-keeping and locking to make sure no matter which
order things disappear, we don't try to unregister/detach a dsi device
twice.

Now whether that device model is correct is maybe a different question,
but I think it's entirely orthogonal issue too.

Or maybe I'm just wrong ...

Cheers, Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ