[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7003f6f3-a90f-4423-982b-7862bb9a1489@gmail.com>
Date: Mon, 24 Jun 2024 21:57:36 +0200
From: Heiner Kallweit <hkallweit1@...il.com>
To: Andrew Lunn <andrew@...n.ch>, Hans de Goede <hdegoede@...hat.com>
Cc: Linux regressions mailing list <regressions@...ts.linux.dev>,
 Pavel Machek <pavel@....cz>, Lee Jones <lee@...nel.org>,
 Linux LEDs <linux-leds@...r.kernel.org>, linux-kernel@...r.kernel.org,
 netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
 kuba@...nel.org, pabeni@...hat.com, johanneswueller@...il.com,
 "Russell King (Oracle)" <linux@...linux.org.uk>,
 Genes Lists <lists@...ience.com>
Subject: Re: Hung tasks due to a AB-BA deadlock between the leds_list_lock
 rwsem and the rtnl mutex
On 31.05.2024 14:54, Andrew Lunn wrote:
>> I actually have been looking at a ledtrig-netdev lockdep warning yesterday
>> which I believe is the same thing. I'll include the lockdep trace below.
>>
>> According to lockdep there indeed is a ABBA (ish) cyclic deadlock with
>> the rtnl mutex vs led-triggers related locks. I believe that this problem
>> may be a pre-existing problem but this now actually gets hit in kernels >=
>> 6.9 because of commit 66601a29bb23 ("leds: class: If no default trigger is
>> given, make hw_control trigger the default trigger"). Before that commit
>> the "netdev" trigger would not be bound / set as phy LEDs trigger by default.
>>
>> +Cc Heiner Kallweit who authored that commit.
>>
>> The netdev trigger typically is not needed because the PHY LEDs are typically
>> under hw-control and the netdev trigger even tries to leave things that way
>> so setting it as the active trigger for the LED class device is basically
>> a no-op. I guess the goal of that commit is correctly have the triggers
>> file content reflect that the LED is controlled by a netdev and to allow
>> changing the hw-control mode without the user first needing to set netdev
>> as trigger before being able to change the mode.
> 
> It was not the intention that this triggers is loaded for all
> systems. It should only be those that actually have LEDs which can be
> controlled:
> 
> drivers/net/ethernet/realtek/r8169_leds.c:	led_cdev->hw_control_trigger = "netdev";
> drivers/net/ethernet/realtek/r8169_leds.c:	led_cdev->hw_control_trigger = "netdev";
> drivers/net/ethernet/intel/igc/igc_leds.c:	led_cdev->hw_control_trigger = "netdev";
> drivers/net/dsa/qca/qca8k-leds.c:		port_led->cdev.hw_control_trigger = "netdev";
> drivers/net/phy/phy_device.c:		cdev->hw_control_trigger = "netdev";
> 
> Reverting this patch does seem like a good way forward, but i would
> also like to give Heiner a little bit of time to see if he has a quick
> real fix.
> 
Sorry, I was on vacation and can reply only now.
I agree that a revert is appropriate because the actual issue that this change revealed
seems to be non-trivial to fix.
Drawback is that now a user who wants to control LEDs has to know that he first has to
manually load the netdev trigger module and set the netdev trigger.
That the netdev trigger module is now loaded by default on systems with r8169 I don't
really see as an issue. I think it's normal that this is opt-out.
If a user has a problem with this for whatever reason he has several options to
prevent this:
- disable R8169 LED support in kernel config
- disable netdev trigger in kernel config
- remove netdev trigger module
- blacklist netdev trigger module
>      Andrew
Heiner
Powered by blists - more mailing lists
 
