[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <42ef8c8f-2fc0-a210-969b-7b0d648d8226@samsung.com>
Date: Mon, 18 Sep 2023 14:33:04 +0200
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: "Russell King (Oracle)" <rmk+kernel@...linux.org.uk>, Andrew Lunn
<andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>
Cc: chenhao418@...wei.com, "David S. Miller" <davem@...emloft.net>, Eric
Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Jijie Shao
<shaojijie@...wei.com>, lanhao@...wei.com, liuyonglong@...wei.com,
netdev@...r.kernel.org, Paolo Abeni <pabeni@...hat.com>,
shenjian15@...wei.com, wangjie125@...wei.com, wangpeiyang1@...wei.com
Subject: Re: [PATCH net-next 1/7] net: phy: always call
phy_process_state_change() under lock
Hi Russell,
On 14.09.2023 17:35, Russell King (Oracle) wrote:
> phy_stop() calls phy_process_state_change() while holding the phydev
> lock, so also arrange for phy_state_machine() to do the same, so that
> this function is called with consistent locking.
>
> Tested-by: Jijie Shao <shaojijie@...wei.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@...linux.org.uk>
This change, merged to linux-next as commit 8da77df649c4 ("net: phy:
always call phy_process_state_change() under lock") introduces the
following deadlock with ASIX AX8817X USB driver:
--->8---
asix 1-1.4:1.0 (unnamed net_device) (uninitialized): PHY
[usb-001:003:10] driver [Asix Electronics AX88772A] (irq=POLL)
Asix Electronics AX88772A usb-001:003:10: attached PHY driver
(mii_bus:phy_addr=usb-001:003:10, irq=POLL)
asix 1-1.4:1.0 eth0: register 'asix' at usb-12110000.usb-1.4, ASIX
AX88772 USB 2.0 Ethernet, a2:99:b6:cd:11:eb
asix 1-1.4:1.0 eth0: configuring for phy/internal link mode
============================================
WARNING: possible recursive locking detected
6.6.0-rc1-00239-g8da77df649c4-dirty #13949 Not tainted
--------------------------------------------
kworker/3:3/71 is trying to acquire lock:
c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_start_aneg+0x1c/0x38
but task is already holding lock:
c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x100/0x2b8
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dev->lock);
lock(&dev->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/3:3/71:
#0: c1c090a8 ((wq_completion)events_power_efficient){+.+.}-{0:0}, at:
process_one_work+0x148/0x608
#1: f0bddf20
((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at:
process_one_work+0x148/0x608
#2: c6c704cc (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x100/0x2b8
stack backtrace:
CPU: 3 PID: 71 Comm: kworker/3:3 Not tainted
6.6.0-rc1-00239-g8da77df649c4-dirty #13949
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_power_efficient phy_state_machine
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from __lock_acquire+0x1300/0x2984
__lock_acquire from lock_acquire+0x130/0x37c
lock_acquire from __mutex_lock+0x94/0x94c
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from phy_start_aneg+0x1c/0x38
phy_start_aneg from phy_state_machine+0x10c/0x2b8
phy_state_machine from process_one_work+0x204/0x608
process_one_work from worker_thread+0x1e0/0x498
worker_thread from kthread+0x104/0x138
kthread from ret_from_fork+0x14/0x28
Exception stack(0xf0bddfb0 to 0xf0bddff8)
...
-------
This probably need to be fixed somewhere in drivers/net/usb/asix* but at
the first glance I don't see any obvious place that need a fix.
> ---
> drivers/net/phy/phy.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index df54c137c5f5..1e5218935eb3 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -1506,6 +1506,7 @@ void phy_state_machine(struct work_struct *work)
> if (err < 0)
> phy_error_precise(phydev, func, err);
>
> + mutex_lock(&phydev->lock);
> phy_process_state_change(phydev, old_state);
>
> /* Only re-schedule a PHY state machine change if we are polling the
> @@ -1516,7 +1517,6 @@ void phy_state_machine(struct work_struct *work)
> * state machine would be pointless and possibly error prone when
> * called from phy_disconnect() synchronously.
> */
> - mutex_lock(&phydev->lock);
> if (phy_polling_mode(phydev) && phy_is_started(phydev))
> phy_queue_state_machine(phydev, PHY_STATE_TIME);
> mutex_unlock(&phydev->lock);
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Powered by blists - more mailing lists