lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <1a7e74b4-8827-c14b-7371-9656a643d03c@linux.alibaba.com> Date: Sun, 20 Feb 2022 20:26:18 +0800 From: Heyi Guo <guoheyi@...ux.alibaba.com> To: Andrew Lunn <andrew@...n.ch> Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Joel Stanley <joel@....id.au>, Benjamin Herrenschmidt <benh@...nel.crashing.org>, Dylan Hung <dylan_hung@...eedtech.com>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [Issue report] drivers/ftgmac100: DHCP occasionally fails during boot up or link down/up Hi Andrew, There is indeed a dead lock warning after enabling prove_locking: [ 16.852199] ====================================================== [ 16.859102] WARNING: possible circular locking dependency detected [ 16.866012] 5.10.36-60b3c9d-dirty-15f4fba #1 Not tainted [ 16.871976] ------------------------------------------------------ [ 16.871991] kworker/1:1/23 is trying to acquire lock: [ 16.872000] 80fa0920 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x28 [ 16.872047] [ 16.872047] but task is already holding lock: [ 16.872051] 821d44c0 (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x50/0x290 [ 16.872076] [ 16.872076] which lock already depends on the new lock. [ 16.872076] [ 16.872080] [ 16.872080] the existing dependency chain (in reverse order) is: [ 16.872083] [ 16.872083] -> #1 (&dev->lock){+.+.}-{3:3}: [ 16.872106] lock_acquire+0x6c/0x74 [ 16.872117] __mutex_lock+0xb4/0xa48 [ 16.872132] mutex_lock_nested+0x2c/0x34 [ 16.872141] phy_start+0x30/0xc4 [ 16.872155] ftgmac100_open+0x1a0/0x254 [ 16.872168] __dev_open+0x110/0x1d0 [ 16.872180] __dev_change_flags+0x1d0/0x258 [ 16.872192] dev_change_flags+0x28/0x58 [ 16.872204] do_setlink+0x258/0xc60 [ 16.872212] rtnl_setlink+0x110/0x18c [ 16.872219] rtnetlink_rcv_msg+0x1d0/0x53c [ 16.872226] netlink_rcv_skb+0xd0/0x128 [ 16.872233] rtnetlink_rcv+0x20/0x24 [ 16.872244] netlink_unicast+0x1a8/0x26c [ 16.872254] netlink_sendmsg+0x220/0x464 [ 16.872265] __sys_sendto+0xe4/0x134 [ 16.872276] sys_sendto+0x24/0x2c [ 16.872288] ret_fast_syscall+0x0/0x28 [ 16.872297] 0x7ed9e928 [ 16.872301] [ 16.872301] -> #0 (rtnl_mutex){+.+.}-{3:3}: [ 16.872325] __lock_acquire+0x17e8/0x3268 [ 16.872331] lock_acquire.part.0+0xcc/0x394 [ 16.872341] lock_acquire+0x6c/0x74 [ 16.872354] __mutex_lock+0xb4/0xa48 [ 16.872365] mutex_lock_nested+0x2c/0x34 [ 16.872377] rtnl_lock+0x24/0x28 [ 16.872389] ftgmac100_adjust_link+0xc0/0x144 [ 16.872401] phy_link_change+0x38/0x64 [ 16.872411] phy_check_link_status+0xa8/0xfc [ 16.872422] phy_state_machine+0x80/0x290 [ 16.872435] process_one_work+0x294/0x7d8 [ 16.872447] worker_thread+0x6c/0x548 [ 16.872456] kthread+0x170/0x178 [ 16.872462] ret_from_fork+0x14/0x20 [ 16.872467] 0x0 [ 16.872471] [ 16.872471] other info that might help us debug this: [ 16.872471] [ 16.872475] Possible unsafe locking scenario: [ 16.872475] [ 16.872478] CPU0 CPU1 [ 16.872482] ---- ---- [ 16.872485] lock(&dev->lock); [ 16.872495] lock(rtnl_mutex); [ 16.872505] lock(&dev->lock); [ 16.872513] lock(rtnl_mutex); [ 16.872522] [ 16.872522] *** DEADLOCK *** [ 16.872522] [ 16.872528] 3 locks held by kworker/1:1/23: [ 16.872532] #0: 818472a8 ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x1e8/0x7d8 [ 16.872558] #1: 819fbef8 ((work_completion)(&(&dev->state_queue)->work)){+.+.}-{0:0}, at: process_one_work+0x1e8/0x7d8 [ 16.872582] #2: 821d44c0 (&dev->lock){+.+.}-{3:3}, at: phy_state_machine+0x50/0x290 Any advice to get around of it? Thanks, Heyi 在 2022/2/20 上午2:28, Andrew Lunn 写道: > On Sat, Feb 19, 2022 at 06:08:35PM +0800, Heyi Guo wrote: >> Hi Andrew, >> >> The DHCP issue is gone after applying below patch. I put the lock statements >> outside of the pure reset function, for the phydev lock has been acquired >> before calling adjust_link. The lock order in ftgmac100_reset_task() was >> also changed, to make it the same as the lock procedure in adjust_link, in >> which the phydev is locked first and then rtnl_lock. I'm not quite sure >> whether it will bring in any potential dead lock. Any advice? > Did you run the code with CONFIG_PROVE_LOCKING enabled. That will help > detect possible deadlock situations. > > Andrew
Powered by blists - more mailing lists