[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DFU7CEPUSG9A.1KKGVW4HIPMSH@kernel.org>
Date: Wed, 21 Jan 2026 11:40:45 +0100
From: "Danilo Krummrich" <dakr@...nel.org>
To: "Wang Jiayue" <akaieurus@...il.com>, <hanguidong02@...il.com>,
<gregkh@...uxfoundation.org>, <rafael@...nel.org>
Cc: <Aishwarya.TCV@....com>, <broonie@...nel.org>, <chenqiuji666@...il.com>,
<linux-kernel@...r.kernel.org>, <m.szyprowski@...sung.com>,
<robin.clark@....qualcomm.com>, <will@...nel.org>, <robin.murphy@....com>,
<joro@...tes.org>, <iommu@...ts.linux.dev>
Subject: Re: [PATCH v5] driver core: enforce device_lock for
driver_match_device()
(Cc: Rob, Will, Robin, Joerg)
On Wed Jan 21, 2026 at 9:55 AM CET, Wang Jiayue wrote:
> After partially modifying juno.dts, I managed to roughly emulate kernel
> boot on juno board with qemu and successfully reproduced the boot hang.
> Below is the gdb backtrace:
>
> #0 0xffff800080114ae0 in mutex_spin_on_owner (lock=0xffff0000036bfc90, owner=0xffff000003510000, ww_ctx=0x0, waiter=0x0) at kernel/locking/mutex.c:377
> #1 0xffff80008118cecc in mutex_optimistic_spin (waiter=<optimized out>, ww_ctx=<optimized out>, lock=<optimized out>) at kernel/locking/mutex.c:480
> #2 __mutex_lock_common (use_ww_ctx=<optimized out>, ww_ctx=<optimized out>, ip=<optimized out>, nest_lock=<optimized out>, subclass=<optimized out>, state=<optimized out>, lock=<optimized out>) at kernel/locking/mutex.c:618
> #3 __mutex_lock (lock=0xffff0000036bfc90, state=0x2, ip=<optimized out>, nest_lock=<optimized out>, subclass=<optimized out>) at kernel/locking/mutex.c:776
> #4 0xffff80008118d1dc in __mutex_lock_slowpath (lock=0xffff0000036bfc90) at kernel/locking/mutex.c:1065
> #5 0xffff80008118d230 in mutex_lock (lock=0xffff0000036bfc90) at kernel/locking/mutex.c:290
> #6 0xffff8000809cdd1c in device_lock (dev=<optimized out>) at ./include/linux/device.h:895
> #7 class_device_constructor (_T=<optimized out>) at ./include/linux/device.h:913
> #8 driver_match_device_locked (dev=<optimized out>, drv=<optimized out>) at drivers/base/base.h:193
> #9 __driver_attach (dev=0xffff0000036bfc10, data=0xffff800082e64440 <qcom_smmu_tbu_driver+40>) at drivers/base/dd.c:1183
> #10 0xffff8000809cb17c in bus_for_each_dev (bus=0xffff0000036bfc90, start=0x0, data=0xffff800082e64440 <qcom_smmu_tbu_driver+40>, fn=0xffff8000809cdcec <__driver_attach>) at drivers/base/bus.c:383
> #11 0xffff8000809cd03c in driver_attach (drv=0x0) at drivers/base/dd.c:1245
> #12 0xffff8000809cc748 in bus_add_driver (drv=0xffff800082e64440 <qcom_smmu_tbu_driver+40>) at drivers/base/bus.c:715
> #13 0xffff8000809ced28 in driver_register (drv=0xffff800082e64440 <qcom_smmu_tbu_driver+40>) at drivers/base/driver.c:249
> #14 0xffff8000809d0254 in __platform_driver_register (drv=0x0, owner=0xffff000003510000) at drivers/base/platform.c:908
> #15 0xffff8000809a6208 in qcom_smmu_impl_init (smmu=0xffff0000037c0080) at drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c:780
> #16 0xffff8000809a48a0 in arm_smmu_impl_init (smmu=0xffff0000037c0080) at drivers/iommu/arm/arm-smmu/arm-smmu-impl.c:224
> #17 0xffff8000809a2ae0 in arm_smmu_device_probe (pdev=0xffff0000036bfc00) at drivers/iommu/arm/arm-smmu/arm-smmu.c:2155
> #18 0xffff8000809d060c in platform_probe (_dev=0xffff0000036bfc10) at drivers/base/platform.c:1446
> #19 0xffff8000809cd6a4 in call_driver_probe (drv=<optimized out>, dev=<optimized out>) at drivers/base/dd.c:583
> #20 really_probe (dev=0xffff0000036bfc10, drv=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/dd.c:661
> #21 0xffff8000809cd8f8 in __driver_probe_device (drv=0xffff800082e641c0 <arm_smmu_driver+40>, dev=0xffff0000036bfc10) at drivers/base/dd.c:803
> #22 0xffff8000809cdb34 in driver_probe_device (drv=0xffff0000036bfc90, dev=0xffff0000036bfc10) at drivers/base/dd.c:833
> #23 0xffff8000809cddb8 in __driver_attach (data=<optimized out>, dev=<optimized out>) at drivers/base/dd.c:1227
> #24 __driver_attach (dev=0xffff0000036bfc10, data=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/dd.c:1167
> #25 0xffff8000809cb17c in bus_for_each_dev (bus=0xffff0000036bfc90, start=0x0, data=0xffff800082e641c0 <arm_smmu_driver+40>, fn=0xffff8000809cdcec <__driver_attach>) at drivers/base/bus.c:383
> #26 0xffff8000809cd03c in driver_attach (drv=0x0) at drivers/base/dd.c:1245
> #27 0xffff8000809cc748 in bus_add_driver (drv=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/bus.c:715
> #28 0xffff8000809ced28 in driver_register (drv=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/driver.c:249
> #29 0xffff8000809d0254 in __platform_driver_register (drv=0x0, owner=0xffff000003510000) at drivers/base/platform.c:908
> #30 0xffff800081f3d12c in arm_smmu_driver_init () at drivers/iommu/arm/arm-smmu/arm-smmu.c:2368
> #31 0xffff800080015218 in do_one_initcall (fn=0xffff800081f3d10c <arm_smmu_driver_init>) at init/main.c:1378
> #32 0xffff800081ed13e4 in do_initcall_level (command_line=<optimized out>, level=<optimized out>) at init/main.c:1440
> #33 do_initcalls () at init/main.c:1456
> #34 do_basic_setup () at init/main.c:1475
> #35 kernel_init_freeable () at init/main.c:1688
> #36 0xffff800081187b50 in kernel_init (unused=0xffff0000036bfc90) at init/main.c:1578
> #37 0xffff800080015f58 in ret_from_fork () at arch/arm64/kernel/entry.S:860
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thanks, this backtrace is very helpful. My lockdep patch should reveal the same
issue once run on real hardware, but with this it's probably not even necessary
anymore.
So, the problem is that in the callstack of the arm-smmu driver's (a platform
driver) probe() function, the QCOM specific code (through arm_smmu_impl_init())
registers another platform driver. Since we are still in probe() of arm-smmu the
call to platform_driver_register() happens with the device lock of the arm-smmu
platform device held.
platform_driver_register() eventually results in driver_attach() which iterates
over all the devices of a bus. Since the device we are probing and the driver we
are registering are for the same bus (i.e. the platform bus) it can now happen
that by chance that we also match the exact same device that is currently probed
again. And since we take the device lock for matching now, we actually take the
same lock twice.
Now, we could avoid this by not matching bound devices, but we check this
through dev->driver while holding the device lock, so that doesn't help.
But on the other hand, I don't see any reason why a driver would call
platform_driver_register() from probe() in the first place. I think drivers
should not do that and instead just register the driver through a normal
initcall.
(If, however, it turns out that registering drivers from probe() is something we
really need for some reason, it is probably best to drop the patch and don't
make any guarantees about whether match() is called with the device lock held or
not.
Consequently, driver_override must be protected with a separate lock (which
would be the cleaner solution in any case).)
Powered by blists - more mailing lists