[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DFW4JJYIDC2O.3L1XXBT5MY9SI@kernel.org>
Date: Fri, 23 Jan 2026 17:54:21 +0100
From: "Danilo Krummrich" <dakr@...nel.org>
To: "Jon Hunter" <jonathanh@...dia.com>
Cc: "Gui-Dong Han" <hanguidong02@...il.com>, "Marek Szyprowski"
<m.szyprowski@...sung.com>, "Mark Brown" <broonie@...nel.org>,
<gregkh@...uxfoundation.org>, <rafael@...nel.org>,
<linux-kernel@...r.kernel.org>, <baijiaju1990@...il.com>, "Qiu-ji Chen"
<chenqiuji666@...il.com>, <Aishwarya.TCV@....com>,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v5] driver core: enforce device_lock for
driver_match_device()
On Fri Jan 23, 2026 at 3:29 PM CET, Jon Hunter wrote:
> I can fix this by either:
>
> 1. Reverting this patch.
> 2. Disabling the QSPI driver.
>
> Now the QSPI driver has issues which need to be fixed which I am
> wondering once fix will avoid this problem.
>
> However, I guess regardless of the QSPI issue, should this patch be
> having such an impact?
So, this patch by itself is correct, but it reveals when drivers do the wrong
thing, that is register drivers from contexts where it neither makes sense nor
it is supported by the driver core.
The deadlock happens when a driver (A) registers another driver (B) from a
context where the device lock of the device bound to (A) is held, e.g. from bus
callbacks, such as probe(). See also [1].
While never valid, the deadlock does only occur when (A) and (B) are on the same
bus, e.g. when a platform driver registers another platform driver in its
probe() callback.
However, it is a bit more tricky than that: Let's say a platform driver
registers an SPI controller, then spi_register_controller() might scan the SPI
bus and register SPI devices (not drivers), which are then probed as well. So
far this is all fine, but if now in one of the SPI drivers probe() callbacks a
platform driver is registered, you have a deadlock condition as well.
So it seems that something of this kind is going on with
drivers/spi/spi-tegra210-quad.c.
I did already run quite thorough analysis throughout the whole kernel tree with
various static analyzers and also played around with LLMs for finding this
pattern.
The tools gave me two results:
(1) The IOMMU one I already fixed [2].
(2) The GPIO driver I posted a patch for in [3].
I specifically also looked for all drivers that are required to run all the
peripherals in the tegra194-p3509-0000+p3668-0000.dts hierarchy, but couldn't
catch anything.
(This is also why I asked about OOT, because there are quite some compatible
strings that are not supported by any upstream driver.)
I think to really see what's going in with spi-tegra210-quad.c, we need the
dumps of the sysrq-triggers I provided in a previous mail.
I'd also recommend to pick a stable state of the spi-tegra210-quad.c driver and
apply this patch on top (or just apply the spi-tegra210-quad.c fixes as well).
Subsequently, we could try and retest with the diff I provided and the
corresponding lockdep options enabled and with the sysrq-triggers (without the
diff).
[1] https://lore.kernel.org/lkml/DFU7CEPUSG9A.1KKGVW4HIPMSH@kernel.org/
[2] https://lore.kernel.org/all/20260121141215.29658-1-dakr@kernel.org/
[3] https://lore.kernel.org/all/20260123133614.72586-1-dakr@kernel.org/
> Please note that a lot of the boards I test are in a farm and I don't
> have direct access. So although I can see the test harness SSH'ing into
> the board, I am not accessing directly. However, we can run whatever
> tests we want.
Maybe you can trigger the sysrq-trigger from a custom test?
Powered by blists - more mailing lists