[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZidpAp1CL3iKfcGz@wunner.de>
Date: Tue, 23 Apr 2024 09:53:38 +0200
From: Lukas Wunner <lukas@...ner.de>
To: Jacob Keller <jacob.e.keller@...el.com>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>, davem@...emloft.net,
kuba@...nel.org, pabeni@...hat.com, edumazet@...gle.com,
netdev@...r.kernel.org, sasha.neftin@...el.com,
Roman Lozko <lozko.roma@...il.com>,
Marek Marczykowski-Górecki <marmarek@...isiblethingslab.com>,
Kurt Kanzenbach <kurt@...utronix.de>,
Heiner Kallweit <hkallweit1@...il.com>,
Simon Horman <horms@...nel.org>,
Naama Meir <naamax.meir@...ux.intel.com>
Subject: Re: [PATCH net] igc: Fix LED-related deadlock on driver unbind
On Mon, Apr 22, 2024 at 04:32:01PM -0700, Jacob Keller wrote:
> On 4/22/2024 1:45 PM, Tony Nguyen wrote:
> > Roman reports a deadlock on unplug of a Thunderbolt docking station
> > containing an Intel I225 Ethernet adapter.
> >
> > The root cause is that led_classdev's for LEDs on the adapter are
> > registered such that they're device-managed by the netdev. That
> > results in recursive acquisition of the rtnl_lock() mutex on unplug:
> >
> > When the driver calls unregister_netdev(), it acquires rtnl_lock(),
> > then frees the device-managed resources. Upon unregistering the LEDs,
> > netdev_trig_deactivate() invokes unregister_netdevice_notifier(),
> > which tries to acquire rtnl_lock() again.
> >
> > Avoid by using non-device-managed LED registration.
> >
>
> Could we instead switch to using devm with the PCI device struct instead
> of the netdev struct?
No, unfortunately that doesn't work:
The unregistering of the LEDs would then happen after unbind of the
pci_dev, i.e. after igc_release_hw_control() and pci_disable_device().
The LED registers aren't even accessible at that point, but the LEDs
are still exposed in sysfs. I tried that approach but then realized
it's a mistake:
https://lore.kernel.org/all/ZhBN9p1yOyciXkzw@wunner.de/
Andrew Lunn concurred and wrote that "LEDs need to be added and
explicitly removed within the life cycle of the netdev":
https://lore.kernel.org/all/7cfb1af7-3270-447a-a2cf-16c2af02ec29@lunn.ch/
We'd have to convert the igc driver to use devm_*() for everything to
avoid this ordering issue. I don't think that's something we can do
at this point in the cycle. The present patch fixes a regression
introduced with v6.9-rc1.
There's another reason this approach doesn't work:
The first argument to devm_led_classdev_register() has two purposes:
(1) It's used to manage the resource (i.e. LED is unregistered on unbind),
(2) but it's also used as the parent below which the LED appears in sysfs.
If I changed the argument to the pci_dev, the LED would suddenly appear
below the pci_dev in sysfs, instead of the netdev. So the patch would
result in an undesired change of behavior.
Of course we can discuss introducing a new devm_*() helper which accepts
separate device arguments for the two purposes above. But that would
likewise be something we can't do at this point in the cycle.
We discussed the conundrum of the dual-purpose device argument in a
separate thread for r8169 (which suffered from the same LED deadlock):
https://lore.kernel.org/all/20240405205903.GA3458@wunner.de/
Thanks,
Lukas
Powered by blists - more mailing lists