[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2e22734e-6577-445d-af5e-846dbcce076e@linaro.org>
Date: Tue, 25 Jun 2024 10:20:05 +0200
From: Neil Armstrong <neil.armstrong@...aro.org>
To: Abel Vesa <abel.vesa@...aro.org>, Johan Hovold <johan@...nel.org>
Cc: Andy Gross <agross@...nel.org>, Bjorn Andersson <andersson@...nel.org>,
Konrad Dybcio <konrad.dybcio@...aro.org>, Lee Jones <lee@...nel.org>,
Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
"vkoul@...nel.org" <vkoul@...nel.org>,
Kishon Vijay Abraham I <kishon@...nel.org>,
Johan Hovold <johan+linaro@...nel.org>, linux-arm-msm@...r.kernel.org,
linux-phy@...ts.infradead.org, devicetree@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Lockdep broken on x1e80100
On 25/06/2024 09:37, Abel Vesa wrote:
> On 24-06-25 08:47:29, Johan Hovold wrote:
>> On Wed, Feb 08, 2023 at 09:01:53PM +0200, Abel Vesa wrote:
>>> This patchset adds support for the eUSB2 repeater found in pmic PM8550B,
>>> used along with SM8550. Since there is no dedicated generic framework
>>> for eUSB2 repeaters, the most appropriate subsystem to model it is the
>>> generic phy. This patchset also adds support for such repeater to the
>>> eUSB2 PHY found in SM8550. Basically, the eUSB2 PHY will have its own
>>> "phy" which is actually a repeater.
>>
>> The decision to model the repeater as a PHY unfortunately breaks lockdep
>> as you now have functions like phy_init() calling phy_init() for a
>> second PHY (the repeater, see splat below).
>>
>
> This was reported by Bjorn off-list a couple of months ago. I did check
> it then and the order is perfectly fine. The solution here should be to
> use mutex_lock_nested in the PHY framework. This would allow supporting
> chain-linked PHYs. The possibility of moving out the repeater out of PHY
> was also discussed. Unfortunately, I didn't have the bandwidth to
> circle back and properly investigate and fix it.
Well technically it's a PHY, and moving out from PHY will basically duplicate
the PHY core code... so we should rather make sure we can call phy code from
phy callbacks safely.
Neil
>
>> As long as the locks are always taken in the same order there should be
>> no risk for a deadlock, but can you please verify that and add the
>> missing lockdep annotation so that lockdep can be used on platforms like
>> x1e80100 (e.g. to prevent further locking issues from being introduced)?
>>
>> Johan
>>
>>
>> [ 8.613248] ============================================
>> [ 8.669073] WARNING: possible recursive locking detected
>> [ 8.669074] 6.10.0-rc5 #122 Not tainted
>> [ 8.669075] --------------------------------------------
>> [ 8.669075] kworker/u50:0/77 is trying to acquire lock:
>> [ 8.669076] ffff5cae8733ecf8 (&phy->mutex){+.+.}-{3:3}, at: phy_init+0x4c/0x12c
>> [ 8.669087]
>> but task is already holding lock:
>> [ 8.669088] ffff5cae8a056cf8 (&phy->mutex){+.+.}-{3:3}, at: phy_init+0x4c/0x12c
>> [ 8.669092]
>> other info that might help us debug this:
>> [ 8.669092] Possible unsafe locking scenario:
>>
>> [ 8.669093] CPU0
>> [ 8.669093] ----
>> [ 8.669094] lock(&phy->mutex);
>> [ 8.669095] lock(&phy->mutex);
>> [ 8.669097]
>> *** DEADLOCK ***
>>
>> [ 8.669097] May be due to missing lock nesting notation
>>
>> [ 8.669097] 4 locks held by kworker/u50:0/77:
>> [ 8.669099] #0: ffff5cae80010948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1a4/0x638
>> [ 8.669108] #1: ffff800080333de0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1cc/0x638
>> [ 8.669112] #2: ffff5cae854038f8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x38/0x1d4
>> [ 8.669117] #3: ffff5cae8a056cf8 (&phy->mutex){+.+.}-{3:3}, at: phy_init+0x4c/0x12c
>> [ 8.669121]
>> stack backtrace:
>> [ 8.669122] CPU: 9 PID: 77 Comm: kworker/u50:0 Not tainted 6.10.0-rc5 #122
>> [ 8.669124] Hardware name: Qualcomm CRD, BIOS 6.0.231221.BOOT.MXF.2.4-00348.1-HAMOA-1 12/21/2023
>> [ 8.669125] Workqueue: events_unbound deferred_probe_work_func
>> [ 8.669128] Call trace:
>> [ 8.669129] dump_backtrace+0x9c/0x11c
>> [ 8.870384] show_stack+0x18/0x24
>> [ 8.870386] dump_stack_lvl+0x90/0xd0
>> [ 8.870391] dump_stack+0x18/0x24
>> [ 8.870393] print_deadlock_bug+0x25c/0x348
>> [ 8.870396] __lock_acquire+0x10a4/0x2064
>> [ 8.870399] lock_acquire.part.0+0xc8/0x20c
>> [ 8.870401] lock_acquire+0x68/0x84
>> [ 8.870403] __mutex_lock+0x98/0x428
>> [ 8.870407] mutex_lock_nested+0x24/0x30
>> [ 8.870410] phy_init+0x4c/0x12c
>> [ 8.870412] qcom_snps_eusb2_hsphy_init+0x54/0x420 [phy_qcom_snps_eusb2]
>> [ 8.870416] phy_init+0xe0/0x12c
>> [ 8.870418] dwc3_core_init+0x484/0x1214
>> [ 8.870421] dwc3_probe+0xe54/0x171c
>> [ 8.870424] platform_probe+0x68/0xd8
>> [ 8.870426] really_probe+0xc0/0x388
>> [ 8.870427] __driver_probe_device+0x7c/0x160
>> [ 8.870429] driver_probe_device+0x40/0x114
>> [ 8.870430] __device_attach_driver+0xbc/0x158
>> [ 8.870432] bus_for_each_drv+0x84/0xe0
>> [ 8.870433] __device_attach+0xa8/0x1d4
>> [ 8.870435] device_initial_probe+0x14/0x20
>> [ 8.870436] bus_probe_device+0xb0/0xb4
>> [ 8.870437] deferred_probe_work_func+0xa0/0xf4
>> [ 8.870439] process_one_work+0x224/0x638
>> [ 8.870441] worker_thread+0x268/0x3a8
>> [ 8.870442] kthread+0x124/0x128
>> [ 8.870443] ret_from_fork+0x10/0x20
Powered by blists - more mailing lists