[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250406221423.9723-2-ansuelsmth@gmail.com>
Date: Mon, 7 Apr 2025 00:13:54 +0200
From: Christian Marangi <ansuelsmth@...il.com>
To: Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley <conor+dt@...nel.org>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Heiner Kallweit <hkallweit1@...il.com>,
Russell King <linux@...linux.org.uk>,
Philipp Zabel <p.zabel@...gutronix.de>,
Christian Marangi <ansuelsmth@...il.com>,
Daniel Golle <daniel@...rotopia.org>,
"Russell King (Oracle)" <rmk+kernel@...linux.org.uk>,
netdev@...r.kernel.org,
devicetree@...r.kernel.org,
linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org,
"Lei Wei (QUIC)" <quic_leiwei@...cinc.com>
Cc: stable@...r.kernel.org
Subject: [RFC PATCH net-next v2 01/11] net: phylink: fix possible circular locking dep with config in-band
Debug lock detection revealed a possible circular locking dependency between
phylink_major_config() and phylink_bringup_phy().
This was introduced by the addition of phy_config_inband(), which acquires
the phydev lock. This made the locking order in phylink_bringup_phy()
inconsistent, as it acquires the phydev lock before phylink's state_mutex.
A deadlock can occur when phylink_major_config() is called from
phylink_resolve(), where the state_mutex is taken first and then the phydev
lock. This is the reverse of the order used in phylink_bringup_phy().
To avoid this circular dependency, change the locking order in
phylink_bringup_phy() to match the pattern used in phylink_resolve(): take
state_mutex first, then the phydev lock.
A sample lockdep warning is included below for reference:
[ 147.749178]
[ 147.750682] ======================================================
[ 147.756850] WARNING: possible circular locking dependency detected
[ 147.763019] 6.14.0-next-20250404+ #0 Tainted: G O
[ 147.769189] ------------------------------------------------------
[ 147.775356] kworker/u16:0/12 is trying to acquire lock:
[ 147.780571] ffffff80ce9bcf08 (&dev->lock#2){+.+.}-{4:4}, at: phy_config_inband+0x44/0x90
[ 147.788672]
[ 147.788672] but task is already holding lock:
[ 147.794492] ffffff80c0dfbda0 (&pl->state_mutex){+.+.}-{4:4}, at: phylink_resolve+0x2c/0x6a8
[ 147.802840]
[ 147.802840] which lock already depends on the new lock.
[ 147.802840]
[ 147.811002]
[ 147.811002] the existing dependency chain (in reverse order) is:
[ 147.818472]
[ 147.818472] -> #1 (&pl->state_mutex){+.+.}-{4:4}:
[ 147.824647] __mutex_lock+0x90/0x924
[ 147.828738] mutex_lock_nested+0x20/0x28
[ 147.833173] phylink_bringup_phy+0x210/0x700
[ 147.837954] phylink_fwnode_phy_connect+0xe0/0x124
[ 147.843256] phylink_of_phy_connect+0x18/0x20
[ 147.848124] dsa_user_create+0x210/0x414
[ 147.852561] dsa_port_setup+0xd4/0x14c
[ 147.856823] dsa_register_switch+0xbb0/0xe40
[ 147.861605] mt7988_probe+0xf8/0x140
[ 147.865694] platform_probe+0x64/0xbc
[ 147.869869] really_probe+0xbc/0x388
[ 147.873955] __driver_probe_device+0x78/0x154
[ 147.878823] driver_probe_device+0x3c/0xd4
[ 147.883430] __device_attach_driver+0xb0/0x144
[ 147.888383] bus_for_each_drv+0x6c/0xb0
[ 147.892732] __device_attach+0x9c/0x19c
[ 147.897078] device_initial_probe+0x10/0x18
[ 147.901771] bus_probe_device+0xa8/0xac
[ 147.906118] deferred_probe_work_func+0xb8/0x118
[ 147.911245] process_one_work+0x224/0x610
[ 147.915769] worker_thread+0x1b8/0x35c
[ 147.920029] kthread+0x11c/0x1e8
[ 147.923769] ret_from_fork+0x10/0x20
[ 147.927857]
[ 147.927857] -> #0 (&dev->lock#2){+.+.}-{4:4}:
[ 147.933686] __lock_acquire+0x12b8/0x1ff0
[ 147.938209] lock_acquire+0xf4/0x2d8
[ 147.942295] __mutex_lock+0x90/0x924
[ 147.946383] mutex_lock_nested+0x20/0x28
[ 147.950817] phy_config_inband+0x44/0x90
[ 147.955252] phylink_major_config+0x684/0xa64
[ 147.960120] phylink_resolve+0x24c/0x6a8
[ 147.964554] process_one_work+0x224/0x610
[ 147.969075] worker_thread+0x1b8/0x35c
[ 147.973335] kthread+0x11c/0x1e8
[ 147.977075] ret_from_fork+0x10/0x20
[ 147.981162]
[ 147.981162] other info that might help us debug this:
[ 147.981162]
[ 147.989150] Possible unsafe locking scenario:
[ 147.989150]
[ 147.995056] CPU0 CPU1
[ 147.999575] ---- ----
[ 148.004094] lock(&pl->state_mutex);
[ 148.007748] lock(&dev->lock#2);
[ 148.013572] lock(&pl->state_mutex);
[ 148.019742] lock(&dev->lock#2);
[ 148.023051]
[ 148.023051] *** DEADLOCK ***
[ 148.023051]
[ 148.028958] 3 locks held by kworker/u16:0/12:
[ 148.033304] #0: ffffff80c0011d48 ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x1a8/0x610
[ 148.044082] #1: ffffffc081ca3dd8 ((work_completion)(&pl->resolve)){+.+.}-{0:0}, at: process_one_work+0x1d0/0x610
[ 148.054338] #2: ffffff80c0dfbda0 (&pl->state_mutex){+.+.}-{4:4}, at: phylink_resolve+0x2c/0x6a8
[ 148.063119]
[ 148.063119] stack backtrace:
[ 148.067465] CPU: 3 UID: 0 PID: 12 Comm: kworker/u16:0 Tainted: G O 6.14.0-next-20250404+ #0 NONE
[ 148.067472] Tainted: [O]=OOT_MODULE
[ 148.067474] Hardware name: Bananapi BPI-R4 2.5GE PoE (DT)
[ 148.067476] Workqueue: events_power_efficient phylink_resolve
[ 148.067482] Call trace:
[ 148.067484] show_stack+0x14/0x1c (C)
[ 148.067492] dump_stack_lvl+0x84/0xc0
[ 148.067497] dump_stack+0x14/0x1c
[ 148.067500] print_circular_bug+0x330/0x43c
[ 148.067505] check_noncircular+0x124/0x134
[ 148.067508] __lock_acquire+0x12b8/0x1ff0
[ 148.067512] lock_acquire+0xf4/0x2d8
[ 148.067516] __mutex_lock+0x90/0x924
[ 148.067521] mutex_lock_nested+0x20/0x28
[ 148.067527] phy_config_inband+0x44/0x90
[ 148.067531] phylink_major_config+0x684/0xa64
[ 148.067535] phylink_resolve+0x24c/0x6a8
[ 148.067539] process_one_work+0x224/0x610
[ 148.067544] worker_thread+0x1b8/0x35c
[ 148.067548] kthread+0x11c/0x1e8
[ 148.067552] ret_from_fork+0x10/0x20
Cc: stable@...r.kernel.org
Fixes: 5fd0f1a02e75 ("net: phylink: add negotiation of in-band capabilities")
Signed-off-by: Christian Marangi <ansuelsmth@...il.com>
---
drivers/net/phy/phylink.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 69ca765485db..4a1edf19dfad 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2072,8 +2072,8 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
dev_name(&phy->mdio.dev), phy->drv->name, irq_str);
kfree(irq_str);
- mutex_lock(&phy->lock);
mutex_lock(&pl->state_mutex);
+ mutex_lock(&phy->lock);
pl->phydev = phy;
pl->phy_state.interface = interface;
pl->phy_state.pause = MLO_PAUSE_NONE;
@@ -2115,8 +2115,8 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
phy_disable_eee(phy);
}
- mutex_unlock(&pl->state_mutex);
mutex_unlock(&phy->lock);
+ mutex_unlock(&pl->state_mutex);
phylink_dbg(pl,
"phy: %s setting supported %*pb advertising %*pb\n",
--
2.48.1
Powered by blists - more mailing lists