lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a376e87b-fbd8-4f07-9ab2-80a479782699@intel.com>
Date: Wed, 5 Feb 2025 18:27:40 -0800
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jakub Kicinski <kuba@...nel.org>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: iAVF circular lock dependency due to netdev_lock



On 2/5/2025 5:23 PM, Jakub Kicinski wrote:
> On Wed, 5 Feb 2025 15:20:07 -0800 Jacob Keller wrote:
>> This happens because the driver takes netdev_lock prior to acquiring its
>> own adapter->crit_lock, but then it calls register_netdevice under the
>> crit_lock. Since commit 5fda3f35349b ("net: make netdev_lock() protect
>> netdev->reg_state"), the register_netdevice() function now acquires
>> netdev_lock as part of its flow.
>>
>> I can fix this by refactoring iavf to only take netdev_lock after
>> acquiring its own crit_lock.. but that smells funny. It seems like a
>> future change could require to take netdev_lock before calling into the
>> driver routines somehow, making that ordering problematic.
>>
>> I'm not sure how else to fix this... I briefly considered just removing
>> crit_lock and relying solely on netdev_lock for synchronization, but
>> that doesn't work because of the register_netdevice() taking the lock.
>>
>> I guess I could do some funky stuff with unlocking but that seems ugly
>> as well...
>>
>> I'm not sure what we should do to fix this.
> 
> Not sure either, the locking in this driver is quite odd. Do you know
> why it's registering the netdev from a workqueue, and what the entry
> points to the driver are?
> 

Yes, the locking in iAVF has been problematic for years :(

We register the netdevice from a work queue because we are waiting on
messages from the PF over virtchnl. I don't fully understand the
motivation behind the way the initialization was moved into a work
queue, but this appears to be the historical reasoning from examining
commit logs.

> Normally before the netdev is registered it can't get called, so all 
> the locking is moot. But IDK if we need to protect from some FW
> interactions, maybe?

We had a lot of issues with locking and pain getting to the state we are
in today. I think part of the challenge is that the VF is communicating
asynchronously over virtchnl queue messages to the PF for setup.

Its a mess :( I could re-order the locks so we go "RTNL -> crit_lock ->
netdev_lock" but that will only work as long as no flow from the kernel
does something like "RTNL -> netdev_lock -> <driver callback that takes
crit lock>" which seems unlikely :(

Its a mess and I don't quite know how to dig out of it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ