lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a48564f002b31cb1a8db7680729aac91bc3d3b6b.camel@web.de>
Date: Wed, 25 Sep 2024 21:15:27 +0200
From: Bert Karwatzki <spasswolf@....de>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Stuart Hayes <stuart.w.hayes@...il.com>, linux-kernel@...r.kernel.org, 
	linux-next@...r.kernel.org, spasswolf@....de
Subject: Re: hung tasks on shutdown in linux-next-202409{20,23,24,25}

Am Mittwoch, dem 25.09.2024 um 14:09 +0200 schrieb Greg Kroah-Hartman:
>
>
> Thanks for the report, I _just_ reverted all of these in my branch due
> to another report just like this.  I'll be glad to take them back after
> -rc1 if these issues can be worked out.
>
> So the next linux-next release should be good, OR if you could pull my
> driver-core.git driver-core-next branch to verify the revert worked for
> you, that would be great.
>
> thanks,
>
> greg k-h

The situation is a little more complicated: Your branch (driver-core-next) works
fine(I just retested 10 reboot cycles with driver-core-next, commit 4f2c346e6216
as HEAD). The problems only occur when your branch is merged into linux-next. 
I had the suspicion that the bug is locking related and recompiled next-20240925
with CONFIG_LOCKDEP=y.

These are the lock debugging option I used:

#
# Lock Debugging (spinlocks, mutexes, etc...)
#
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
# CONFIG_WW_MUTEX_SELFTEST is not set
# CONFIG_SCF_TORTURE_TEST is not set
# CONFIG_CSD_LOCK_WAIT_DEBUG is not set
# end of Lock Debugging (spinlocks, mutexes, etc...)

With these .config options the bug becomes harder to trigger, but after 11
reboots
I finally got a screen flooded with messages of the following type:

2 locks held by kworker/u64:251/3047
#0: ffff9fdf80d39548 ((wq_completion)async){+.+.}-{0:0}, at
process_one_work+0x4a4/0x580
#1: ffffb54b11307e58 ((work_completion)(&entry->work)){+.+.}-{0:0}, at
process_one_work+0x1c7/0x580


Bert Karwatzki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ