lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <81562543-5ea1-4994-9503-90b5ff19b094@intel.com>
Date: Wed, 5 Feb 2025 15:20:07 -0800
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>
Subject: iAVF circular lock dependency due to netdev_lock

Hi,

I just discovered that iAVF now has a circular lock dependency on its
critical lock and the netdev lock:

> [ 1504.790308] ====================================================== 
> [ 1504.790309] WARNING: possible circular locking dependency detected 
> [ 1504.790310] 6.13.0 #net_next_rt.c2933b2befe2.el9 Not tainted 
> [ 1504.790311] ------------------------------------------------------ 
> [ 1504.790312] kworker/u128:0/13566 is trying to acquire lock: 
> [ 1504.790313] ffff97d0e4738f18 (&dev->lock){+.+.}-{4:4}, at: register_netdevice+0x52c/0x710 
> [ 1504.790320]  
> [ 1504.790320] but task is already holding lock: 
> [ 1504.790321] ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf] 
> [ 1504.790330]  
> [ 1504.790330] which lock already depends on the new lock. 
> [ 1504.790330]  
> [ 1504.790330]  
> [ 1504.790330] the existing dependency chain (in reverse order) is: 
> [ 1504.790331]  
> [ 1504.790331] -> #1 (&adapter->crit_lock){+.+.}-{4:4}: 
> [ 1504.790333]        __lock_acquire+0x52d/0xbb0 
> [ 1504.790337]        lock_acquire+0xd9/0x330 
> [ 1504.790338]        mutex_lock_nested+0x4b/0xb0 
> [ 1504.790341]        iavf_finish_config+0x37/0x240 [iavf] 
> [ 1504.790347]        process_one_work+0x248/0x6d0 
> [ 1504.790350]        worker_thread+0x18d/0x330 
> [ 1504.790352]        kthread+0x10e/0x250 
> [ 1504.790354]        ret_from_fork+0x30/0x50 
> [ 1504.790357]        ret_from_fork_asm+0x1a/0x30 
> [ 1504.790361]  
> [ 1504.790361] -> #0 (&dev->lock){+.+.}-{4:4}: 
> [ 1504.790364]        check_prev_add+0xf1/0xce0 
> [ 1504.790366]        validate_chain+0x46a/0x570 
> [ 1504.790368]        __lock_acquire+0x52d/0xbb0 
> [ 1504.790370]        lock_acquire+0xd9/0x330 
> [ 1504.790371]        mutex_lock_nested+0x4b/0xb0 
> [ 1504.790372]        register_netdevice+0x52c/0x710 
> [ 1504.790374]        iavf_finish_config+0xfa/0x240 [iavf] 
> [ 1504.790379]        process_one_work+0x248/0x6d0 
> [ 1504.790381]        worker_thread+0x18d/0x330 
> [ 1504.790383]        kthread+0x10e/0x250 
> [ 1504.790385]        ret_from_fork+0x30/0x50 
> [ 1504.790387]        ret_from_fork_asm+0x1a/0x30 
> [ 1504.790389]  
> [ 1504.790389] other info that might help us debug this: 
> [ 1504.790389]  
> [ 1504.790389]  Possible unsafe locking scenario: 
> [ 1504.790389]  
> [ 1504.790390]        CPU0                    CPU1 
> [ 1504.790391]        ----                    ---- 
> [ 1504.790391]   lock(&adapter->crit_lock); 
> [ 1504.790393]                                lock(&dev->lock); 
> [ 1504.790394]                                lock(&adapter->crit_lock); 
> [ 1504.790395]   lock(&dev->lock); 
> [ 1504.790397]  
> [ 1504.790397]  *** DEADLOCK *** 
> [ 1504.790397]  
> [ 1504.790397] 4 locks held by kworker/u128:0/13566: 
> [ 1504.790399]  #0: ffff97d1924d4d48 ((wq_completion)iavf){+.+.}-{0:0}, at: process_one_work+0x49b/0x6d0 
> [ 1504.790404]  #1: ffffa848c3d9be40 ((work_completion)(&adapter->finish_config)){+.+.}-{0:0}, at: process_one_work+0x1f3/0x6d0 
> [ 1504.790408]  #2: ffffffffb3e1bf40 (rtnl_mutex){+.+.}-{4:4}, at: iavf_finish_config+0x18/0x240 [iavf] 
> [ 1504.790416]  #3: ffff97d0e47392e8 (&adapter->crit_lock){+.+.}-{4:4}, at: iavf_finish_config+0x37/0x240 [iavf] 


This happens because the driver takes netdev_lock prior to acquiring its
own adapter->crit_lock, but then it calls register_netdevice under the
crit_lock. Since commit 5fda3f35349b ("net: make netdev_lock() protect
netdev->reg_state"), the register_netdevice() function now acquires
netdev_lock as part of its flow.

I can fix this by refactoring iavf to only take netdev_lock after
acquiring its own crit_lock.. but that smells funny. It seems like a
future change could require to take netdev_lock before calling into the
driver routines somehow, making that ordering problematic.

I'm not sure how else to fix this... I briefly considered just removing
crit_lock and relying solely on netdev_lock for synchronization, but
that doesn't work because of the register_netdevice() taking the lock.

I guess I could do some funky stuff with unlocking but that seems ugly
as well...

I'm not sure what we should do to fix this.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ