linux-kernel - Re: workqueue lockup debug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4e58d34a-ce45-437a-95a2-3ba21f35bbb5@leemhuis.info>
Date: Thu, 7 Nov 2024 13:39:39 +0100
From: Thorsten Leemhuis <regressions@...mhuis.info>
To: John Garry <john.g.garry@...cle.com>, tj@...nel.org,
 jiangshanlai@...il.com, mingo@...hat.com, peterz@...radead.org,
 juri.lelli@...hat.com
Cc: jack@...e.cz, david@...morbit.com, akpm@...ux-foundation.org,
 linux-kernel@...r.kernel.org
Subject: Re: workqueue lockup debug

On 24.10.24 17:49, John Garry wrote:
> Hi workqueue and scheduler maintainers,
> 
> As reported in https://lore.kernel.org/linux-fsdevel/df9db1ce-17d9-49f1-
> ab6d-7ed9a4f1f9c0@...cle.com/T/
> #m506b9edb1340cdddd87c6d14d20222ca8d7e8796, I am experiencing a
> workqueue lockup for v6.12-rcX.

John, what this resolved in between? This and the other thread[1] look
stalled, but I might be missing something. Asking, because I have this
on my list of tracked regressions and wonder if this is something that
better should be solved one way or another before 6.12.

[1]
https://lore.kernel.org/lkml/63d6ceeb-a22f-4dee-bc9d-8687ce4c7355@oracle.com/

Ciao, Thorsten

> At the point it occurs, the system becomes unresponsive and I cannot
> bring it back to life.
> 
> Enabling /proc/sys/kernel/softlockup_all_cpu_backtrace does not give
> anything extra in the way of debug. All I get is something like this:
> 
> Message from syslogd@...rry-atomic-write-exp-e4-8-instance-20231214-1221
> at Oct 24 15:34:02 ...
>  kernel:watchdog: BUG: soft lockup - CPU#29 stuck for 22s! [mysqld:14352]
> 
> Message from syslogd@...rry-atomic-write-exp-e4-8-instance-20231214-1221
> at Oct 24 15:34:02 ...
>  kernel:BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0
> stuck for 30s!
> 
> Message from syslogd@...rry-atomic-write-exp-e4-8-instance-20231214-1221
> at Oct 24 15:34:02 ...
>  kernel:BUG: workqueue lockup - pool cpus=31 node=0 flags=0x0 nice=0
> stuck for 49s!
> ^C
> 
> Can you advise on a robust method to get some debug from this system?
> 
> Maybe this is a scheduler issue, as Dave mentioned in that same thread.
> 
> Thanks,
> John
> 

#regzbot poke