[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <81562543-5ea1-4994-9503-90b5ff19b094@intel.com>
Date: Wed, 5 Feb 2025 15:20:07 -0800
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>
Subject: iAVF circular lock dependency due to netdev_lock
Hi,
I just discovered that iAVF now has a circular lock dependency on its
critical lock and the netdev lock:
> [ 1504.790308] ======================================================
> [ 1504.790309] WARNING: possible circular locking dependency detected
> [ 1504.790310] 6.13.0 #net_next_rt.c2933b2befe2.el9 Not tainted
> [ 1504.790311] ------------------------------------------------------
> [ 1504.790312] kworker/u128:0/13566 is trying to acquire lock:
> [ 1504.790313] ffff97d0e4738f18 (&dev->lock){+.+.}-{4:4}, at: register_netdevice+0x52c/0x710
> [ 1504.790320]
> [ 1504.790320] but task is already holding lock:
> [ 1504.790321] ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf]
> [ 1504.790330]
> [ 1504.790330] which lock already depends on the new lock.
> [ 1504.790330]
> [ 1504.790330]
> [ 1504.790330] the existing dependency chain (in reverse order) is:
> [ 1504.790331]
> [ 1504.790331] -> #1 (&adapter->crit_lock){+.+.}-{4:4}:
> [ 1504.790333] __lock_acquire+0x52d/0xbb0
> [ 1504.790337] lock_acquire+0xd9/0x330
> [ 1504.790338] mutex_lock_nested+0x4b/0xb0
> [ 1504.790341] iavf_finish_config+0x37/0x240 [iavf]
> [ 1504.790347] process_one_work+0x248/0x6d0
> [ 1504.790350] worker_thread+0x18d/0x330
> [ 1504.790352] kthread+0x10e/0x250
> [ 1504.790354] ret_from_fork+0x30/0x50
> [ 1504.790357] ret_from_fork_asm+0x1a/0x30
> [ 1504.790361]
> [ 1504.790361] -> #0 (&dev->lock){+.+.}-{4:4}:
> [ 1504.790364] check_prev_add+0xf1/0xce0
> [ 1504.790366] validate_chain+0x46a/0x570
> [ 1504.790368] __lock_acquire+0x52d/0xbb0
> [ 1504.790370] lock_acquire+0xd9/0x330
> [ 1504.790371] mutex_lock_nested+0x4b/0xb0
> [ 1504.790372] register_netdevice+0x52c/0x710
> [ 1504.790374] iavf_finish_config+0xfa/0x240 [iavf]
> [ 1504.790379] process_one_work+0x248/0x6d0
> [ 1504.790381] worker_thread+0x18d/0x330
> [ 1504.790383] kthread+0x10e/0x250
> [ 1504.790385] ret_from_fork+0x30/0x50
> [ 1504.790387] ret_from_fork_asm+0x1a/0x30
> [ 1504.790389]
> [ 1504.790389] other info that might help us debug this:
> [ 1504.790389]
> [ 1504.790389] Possible unsafe locking scenario:
> [ 1504.790389]
> [ 1504.790390] CPU0 CPU1
> [ 1504.790391] ---- ----
> [ 1504.790391] lock(&adapter->crit_lock);
> [ 1504.790393] lock(&dev->lock);
> [ 1504.790394] lock(&adapter->crit_lock);
> [ 1504.790395] lock(&dev->lock);
> [ 1504.790397]
> [ 1504.790397] *** DEADLOCK ***
> [ 1504.790397]
> [ 1504.790397] 4 locks held by kworker/u128:0/13566:
> [ 1504.790399] #0: ffff97d1924d4d48 ((wq_completion)iavf){+.+.}-{0:0}, at: process_one_work+0x49b/0x6d0
> [ 1504.790404] #1: ffffa848c3d9be40 ((work_completion)(&adapter->finish_config)){+.+.}-{0:0}, at: process_one_work+0x1f3/0x6d0
> [ 1504.790408] #2: ffffffffb3e1bf40 (rtnl_mutex){+.+.}-{4:4}, at: iavf_finish_config+0x18/0x240 [iavf]
> [ 1504.790416] #3: ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf]
This happens because the driver takes netdev_lock prior to acquiring its
own adapter->crit_lock, but then it calls register_netdevice under the
crit_lock. Since commit 5fda3f35349b ("net: make netdev_lock() protect
netdev->reg_state"), the register_netdevice() function now acquires
netdev_lock as part of its flow.
I can fix this by refactoring iavf to only take netdev_lock after
acquiring its own crit_lock.. but that smells funny. It seems like a
future change could require to take netdev_lock before calling into the
driver routines somehow, making that ordering problematic.
I'm not sure how else to fix this... I briefly considered just removing
crit_lock and relying solely on netdev_lock for synchronization, but
that doesn't work because of the register_netdevice() taking the lock.
I guess I could do some funky stuff with unlocking but that seems ugly
as well...
I'm not sure what we should do to fix this.
Powered by blists - more mailing lists