lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZliHhebSGQYZ/0S0@shell.armlinux.org.uk>
Date: Thu, 30 May 2024 15:04:53 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Genes Lists <lists@...ience.com>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org, andrew@...n.ch,
	hkallweit1@...il.com, davem@...emloft.net, edumazet@...gle.com,
	kuba@...nel.org, pabeni@...hat.com, johanneswueller@...il.com
Subject: Re: 6.9.3 Hung tasks

On Thu, May 30, 2024 at 09:36:45AM -0400, Genes Lists wrote:
> On Thu, 2024-05-30 at 08:53 -0400, Genes Lists wrote:
> > 
> > 
> This report for 6.9.1 could well be the same issue:
> 
> https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@gmail.com/

The reg_check_chans_work() thing in pid 285 is likely stuck on the
rtnl lock. The same is true of pid 287.

That will be because of the thread (pid 663) that's stuck in
__dev_open()...led_trigger_register(), where the rtnl lock will have
been taken in that path. It looks to me like led_trigger_register()
is stuck waiting for read access with the leds_list_lock rwsem.

There are only two places that take that rwsem in write mode, which
are led_classdev_register_ext() and led_classdev_unregister(). None
of these paths are blocking in v6.9.

Pid 641 doesn't look significant (its probably waiting for either
pid 285 or 287 to complete its work.)

Pid 666 looks like it is blocked waiting for exclusive write-access
on the leds_list_lock - but it isn't holding that lock. This means
there must already be some other reader or writer holding this lock.

Pid 722 doesn't look sigificant (same as pid 641).

Pid 760 is also waiting for the rtnl lock.

Pid 854, 855 also doesn't look sigificant (as pid 641).

And then we get to pid 858. This is in set_device_name(), which
was called from led_trigger_set() and led_trigger_register().
We know from pid 663 that led_trigger_register() can take a read
on leds_list_lock, and indeed it does and then calls
led_match_default_trigger(), which then goes on to call
led_trigger_set(). Bingo, this is why pid 666 is blocked, which
then blocks pid 663. pid 663 takes the rtnl lock, which blocks
everything else _and_ also blocks pid 858 in set_device_name().

Lockdep would've found this... this is a classic AB-BA deadlock
between the leds_list_lock rwsem and the rtnl mutex.

I haven't checked to see how that deadlock got introduced, that's
for someone else to do.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ