[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a356d2a0-e573-4e31-bae3-2a361476f937@intel.com>
Date: Mon, 22 Apr 2024 16:46:28 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Marek Marczykowski-Górecki
<marmarek@...isiblethingslab.com>
CC: Tony Nguyen <anthony.l.nguyen@...el.com>, <davem@...emloft.net>,
<kuba@...nel.org>, <pabeni@...hat.com>, <edumazet@...gle.com>,
<netdev@...r.kernel.org>, Lukas Wunner <lukas@...ner.de>,
<sasha.neftin@...el.com>, Roman Lozko <lozko.roma@...il.com>, Kurt Kanzenbach
<kurt@...utronix.de>, Heiner Kallweit <hkallweit1@...il.com>, Simon Horman
<horms@...nel.org>, Naama Meir <naamax.meir@...ux.intel.com>
Subject: Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
On 4/22/2024 4:37 PM, Marek Marczykowski-Górecki wrote:
> On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote:
>> On 4/22/2024 1:45 PM, Tony Nguyen wrote:
>>> From: Lukas Wunner <lukas@...ner.de>
>>>
>>> Roman reports a deadlock on unplug of a Thunderbolt docking station
>>> containing an Intel I225 Ethernet adapter.
>>>
>>> The root cause is that led_classdev's for LEDs on the adapter are
>>> registered such that they're device-managed by the netdev. That
>>> results in recursive acquisition of the rtnl_lock() mutex on unplug:
>>>
>>> When the driver calls unregister_netdev(), it acquires rtnl_lock(),
>>> then frees the device-managed resources. Upon unregistering the LEDs,
>>> netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
>>> which tries to acquire rtnl_lock() again.
>>>
>>> Avoid by using non-device-managed LED registration.
>>
>> Could we instead switch to using devm with the PCI device struct instead
>> of the netdev struct? That would make it still get automatically cleaned
>> up, but by cleaning it up only when the PCIe device goes away, which
>> should be after rtnl_lock() is released..
>
> Wouldn't that effectively leak memory if driver is unbound from the
> device and then bound back (and possibly repeated multiple times)?
>
My understanding of devm is that when you unload the driver it calls the
devm teardowns so you only leak until driver remove.
In the netdev case, you're releasing during unregister_netdev() instead
of at the end of the .remove() callback of the PCI driver.
To me, using devm from the PCI device should be equivalent to managing
it manually within the igc_remove() function.
I could be mis-understanding how devm works, or the order and flow for
how and when igc allocates these?
Powered by blists - more mailing lists