lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 31 May 2024 10:39:40 +0200
From: "Linux regression tracking (Thorsten Leemhuis)"
 <regressions@...mhuis.info>
To: Pavel Machek <pavel@....cz>, Lee Jones <lee@...nel.org>,
 Linux LEDs <linux-leds@...r.kernel.org>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org, andrew@...n.ch,
 hkallweit1@...il.com, davem@...emloft.net, edumazet@...gle.com,
 kuba@...nel.org, pabeni@...hat.com, johanneswueller@...il.com,
 "Russell King (Oracle)" <linux@...linux.org.uk>,
 Genes Lists <lists@...ience.com>,
 Linux kernel regressions list <regressions@...ts.linux.dev>
Subject: Hung tasks due to a AB-BA deadlock between the leds_list_lock rwsem
 and the rtnl mutex (was: 6.9.3 Hung tasks)

[adding the LED folks and the regressions list to the list of recipients]

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Lee, Pavel, could you look into below regression report please? Thread
starts here:
https://lore.kernel.org/all/9d189ec329cfe68ed68699f314e191a10d4b5eda.camel@sapience.com/

Another report with somewhat similar symptom can be found here:
https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@gmail.com/

See also Russell's analysis of that report below (many many thx for
that, much appreciated Russel!).

To my untrained eyes all of this sounds a lot like we still have a 6.9
regression related to the LED code somewhere. Reminder, we had earlier
trouble, but that was avoided through other measures:

* 3d913719df14c2 ("wifi: iwlwifi: Use request_module_nowait") /
https://lore.kernel.org/lkml/30f757e3-73c5-5473-c1f8-328bab98fd7d@candelatech.com/

* c04d1b9ecce565 ("igc: Fix LED-related deadlock on driver unbind") /
https://lore.kernel.org/all/ZhRD3cOtz5i-61PB@mail-itl/

* 19fa4f2a85d777 ("r8169: fix LED-related deadlock on module removal")

That iwlwifi commit even calls it self "work around". The developer that
submitted it bisected the problem to a LED merge, but sadly that was the
end of it. :-/

Ciao, Thorsten

On 30.05.24 16:04, Russell King (Oracle) wrote:
> On Thu, May 30, 2024 at 09:36:45AM -0400, Genes Lists wrote:
>> On Thu, 2024-05-30 at 08:53 -0400, Genes Lists wrote:
>> This report for 6.9.1 could well be the same issue:
>> https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@gmail.com/
> 
> The reg_check_chans_work() thing in pid 285 is likely stuck on the
> rtnl lock. The same is true of pid 287.
> 
> That will be because of the thread (pid 663) that's stuck in
> __dev_open()...led_trigger_register(), where the rtnl lock will have
> been taken in that path. It looks to me like led_trigger_register()
> is stuck waiting for read access with the leds_list_lock rwsem.
> 
> There are only two places that take that rwsem in write mode, which
> are led_classdev_register_ext() and led_classdev_unregister(). None
> of these paths are blocking in v6.9.
> 
> Pid 641 doesn't look significant (its probably waiting for either
> pid 285 or 287 to complete its work.)
> 
> Pid 666 looks like it is blocked waiting for exclusive write-access
> on the leds_list_lock - but it isn't holding that lock. This means
> there must already be some other reader or writer holding this lock.
> 
> Pid 722 doesn't look sigificant (same as pid 641).
> 
> Pid 760 is also waiting for the rtnl lock.
> 
> Pid 854, 855 also doesn't look sigificant (as pid 641).
> 
> And then we get to pid 858. This is in set_device_name(), which
> was called from led_trigger_set() and led_trigger_register().
> We know from pid 663 that led_trigger_register() can take a read
> on leds_list_lock, and indeed it does and then calls
> led_match_default_trigger(), which then goes on to call
> led_trigger_set(). Bingo, this is why pid 666 is blocked, which
> then blocks pid 663. pid 663 takes the rtnl lock, which blocks
> everything else _and_ also blocks pid 858 in set_device_name().
> 
> Lockdep would've found this... this is a classic AB-BA deadlock
> between the leds_list_lock rwsem and the rtnl mutex.
> 
> I haven't checked to see how that deadlock got introduced, that's
> for someone else to do.

P.S.:

#regzbot report: /
#regzbot introduced: f5c31bcf604d
#regzbot duplicate:
https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@gmail.com/
#regzbot summary: leds: Hung tasks due to a AB-BA deadlock between the
leds_list_lock rwsem and the rtnl mutex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ