[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250912034538.1406132-1-zhangjian.3032@bytedance.com>
Date: Fri, 12 Sep 2025 11:45:38 +0800
From: Jian Zhang <zhangjian.3032@...edance.com>
To: netdev@...r.kernel.org,
davem@...emloft.net,
andrew+netdev@...n.ch,
guoheyi@...ux.alibaba.com
Cc: Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Jacky Chou <jacky_chou@...eedtech.com>,
Simon Horman <horms@...nel.org>,
Heiner Kallweit <hkallweit1@...il.com>,
Jian Zhang <zhangjian.3032@...edance.com>,
Uwe Kleine-König <u.kleine-koenig@...libre.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
linux-kernel@...r.kernel.org
Subject: [PATCH 1/1] Revert "drivers/net/ftgmac100: fix DHCP potential failure with systemd"
This reverts commit 1baf2e50e48f10f0ea07d53e13381fd0da1546d2.
This patch can trigger a hung task when:
* rtnetlink is setting the link down
* the PHY state_queue is triggered and calls ftgmac100_adjust_link
Within the rtnetlink flow, `cancel_delayed_work_sync` is called while
holding `rtnl_lock`. This function cancels or waits for a delay work
item to complete. If the PHY state_queue (delay work) is simultaneously
executing `adjust_link`, it will eventually call `rtnl_lock` again,
causing a deadlock.
This results in the following (partial) trace:
* rtnetlink (do_setlink):
[ 243.326104] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 243.334871] task:systemd-network state:D stack:0 pid:711 ppid:1 flags:0x0000080d
[ 243.344233] Call trace:
[ 243.346986] __switch_to+0xac/0xd8
[ 243.350814] __schedule+0x3c0/0xb78
[ 243.354734] schedule+0x60/0xc8
[ 243.358258] schedule_timeout+0x188/0x230
[ 243.362762] wait_for_completion+0x7c/0x168
[ 243.367461] __flush_work+0x29c/0x4c8
[ 243.371579] __cancel_work_timer+0x130/0x1b8
[ 243.376376] cancel_delayed_work_sync+0x18/0x28
[ 243.381463] phy_stop+0x7c/0x170
[ 243.385098] ftgmac100_stop+0x78/0xf0
[ 243.389213] __dev_close_many+0xb4/0x160
[ 243.393621] __dev_change_flags+0xfc/0x250
[ 243.398226] dev_change_flags+0x28/0x78
[ 243.402536] do_setlink+0x258/0xdb0
[ 243.406460] rtnl_setlink+0xf0/0x1b8
[ 243.410484] rtnetlink_rcv_msg+0x2a0/0x768
[ 243.415097] netlink_rcv_skb+0x64/0x138
[ 243.419473] rtnetlink_rcv+0x1c/0x30
[ 243.423540] netlink_unicast+0x1c8/0x2a8
[ 243.427973] netlink_sendmsg+0x1c4/0x438
[ 243.432402] __sys_sendto+0xe4/0x178
[ 243.436447] __arm64_sys_sendto+0x2c/0x40
[ 243.440966] invoke_syscall.constprop.0+0x60/0x108
[ 243.446397] do_el0_svc+0xa4/0xc8
[ 243.450171] el0_svc+0x48/0x118
[ 243.453710] el0t_64_sync_handler+0x118/0x128
[ 243.458648] el0t_64_sync+0x14c/0x150
* state_queue (phy_state_machine):
[ 242.882453] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.891226] task:kworker/3:0 state:D stack:0 pid:32 ppid:2 flags:0x00000008
[ 242.900592] Workqueue: events_power_efficient phy_state_machine
[ 242.907250] Call trace:
[ 242.910001] __switch_to+0xac/0xd8
[ 242.913813] __schedule+0x3c0/0xb78
[ 242.917735] schedule+0x60/0xc8
[ 242.921268] schedule_preempt_disabled+0x28/0x48
[ 242.926449] __mutex_lock+0x1cc/0x400
[ 242.930565] mutex_lock_nested+0x28/0x38
[ 242.934971] rtnl_lock+0x60/0x70
[ 242.938607] ftgmac100_reset+0x34/0x248
[ 242.942919] ftgmac100_adjust_link+0xe0/0x150
[ 242.947816] phy_link_change+0x34/0x68
[ 242.952032] phy_check_link_status+0x8c/0xf8
[ 242.956829] phy_state_machine+0x16c/0x2e0
[ 242.961428] process_one_work+0x258/0x620
[ 242.965934] worker_thread+0x1e8/0x3e0
[ 242.970148] kthread+0x114/0x120
[ 242.973762] ret_from_fork+0x10/0x20
Signed-off-by: Jian Zhang <zhangjian.3032@...edance.com>
---
drivers/net/ethernet/faraday/ftgmac100.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index a863f7841210..477719a518bc 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1448,17 +1448,8 @@ static void ftgmac100_adjust_link(struct net_device *netdev)
/* Disable all interrupts */
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
- /* Release phy lock to allow ftgmac100_reset to acquire it, keeping lock
- * order consistent to prevent dead lock.
- */
- if (netdev->phydev)
- mutex_unlock(&netdev->phydev->lock);
-
- ftgmac100_reset(priv);
-
- if (netdev->phydev)
- mutex_lock(&netdev->phydev->lock);
-
+ /* Reset the adapter asynchronously */
+ schedule_work(&priv->reset_task);
}
static int ftgmac100_mii_probe(struct net_device *netdev)
--
2.47.0
Powered by blists - more mailing lists