lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9cc0310c-1fbd-4bfc-aad7-f092583bd81b@gmail.com>
Date: Wed, 25 Sep 2024 16:48:06 -0500
From: stuart hayes <stuart.w.hayes@...il.com>
To: Bert Karwatzki <spasswolf@....de>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: linux-kernel@...r.kernel.org, linux-next@...r.kernel.org
Subject: Re: hung tasks on shutdown in linux-next-202409{20,23,24,25}



On 9/25/2024 4:37 PM, Bert Karwatzki wrote:
> I managed to get the complete lockdep output via netconsole:
> 
> T1;systemd-shutdown[1]: All filesystems unmounted.
> T1;systemd-shutdown[1]: Deactivating swaps.
> T1;systemd-shutdown[1]: All swaps deactivated.
> T1;systemd-shutdown[1]: Detaching loop devices.
> T1;systemd-shutdown[1]: All loop devices detached.
> T1;systemd-shutdown[1]: Stopping MD devices.
> T1;systemd-shutdown[1]: All MD devices stopped.
> T1;systemd-shutdown[1]: Detaching DM devices.
> T1;systemd-shutdown[1]: All DM devices detached.
> T1;systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
> T1;systemd-shutdown[1]: Syncing filesystems and block devices.
> T1;systemd-shutdown[1]: Rebooting.
> T3113;psmouse serio1: Failed to disable mouse on isa0060/serio1#012 SUBSYSTEM=serio#012 DEVICE=+serio:serio1
> 
> Here I was curious if the failed the psmouse message is related to the deadlock.
> I checked the locks and I had similar messages on an unaffected kernel
> (commit 6ec41c442e55) and I had a deadlock in linux-next-20240920 without this
> message.
> 

Thanks for the info.

This definitely appears to be the issue with asynchronous shutdown, which
shouldn't happen anymore now that Greg has reverted the patches.

I'm looking at this now. The async shutdown makes each device wait on children
and consumers to shutdown before shutting down, but it depends on the
devices_kset list having those in the correct order.  The "fix async shutdown
hang" patch fixed a case where suppliers could end up later in this list than
their consumers, causing a circular dependence (and a hang that looks like what
you are seeing).

After that, Andrey Skvortsov reported seeing a hang, where it appears that a
parent device is later in the devices_kset list than a child device, which I
didn't realize could happen... I know how to fix that, but I'm looking at the
code more carefully now to try to understand exactly how that could happen
before I resubmit a new async shutdown patch.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ