lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <6ca2b58c-f689-447f-abc8-4e8dd9bf677a@afaics.de>
Date: Sun, 23 Jun 2024 13:39:41 +0200
From: Harald Dunkel <harri@...ics.de>
To: linux-kernel@...r.kernel.org
Subject: shutdown gets stuck on ancient Intel CPUs (>10 y)

Hi folks,

I've got quite a number of ancient systems that do not shut down
gracefully. On "reboot" they get stuck instead, with these messages
on the console (for example):

[3100272.289498] NMI watchdog: Watchdog detected hard LOCKUP on cpu 5
[3100274.206164] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6
[3100274.432631] NMI watchdog: Watchdog detected hard LOCKUP on cpu 12
[3100277.888319] NMI watchdog: Watchdog detected hard LOCKUP on cpu 11
[3100278.939282] NMI watchdog: Watchdog detected hard LOCKUP on cpu 2
[3100280.147329] rcu: INFO: rcu_preempt self-detected stall on CPU
[3100280.590580] rcu:   14-...!: (5248 ticks this GP) idle=afe4/1/0x4000000000000000 softirq=85687678/85687678 fqs=38
[3100280.713513] rcu: rcu_preempt kthread timer wakeup didn't happen for 5308 jiffies! g227828925 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[3100280.856203] rcu:   Possible timer handling issue on cpu=3 timer-softirq=46376324
[3100280.945837] rcu: rcu_preempt kthread starved for 5367 jiffies! g227828925 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=3
[3100281.077083] rcu:   Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[3100281.188462] rcu: RCU grace-period kthread stack dump:
[3100281.251112] rcu: Stack dump where RCU GP kthread last ran:
[3100282.053462] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10
[3100284.061698] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [kworker/8:0:2191850]
[3100284.065698] watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [migration/9:62]
[3100284.077698] watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [migration/13:82]
[3100285.095828] NMI watchdog: Watchdog detected hard LOCKUP on cpu 7
[3100288.045764] watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [etcd:1614]
[3100291.949829] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [etcd:556385]
[3100292.037830] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:21]
[3100292.081831] watchdog: BUG: soft lockup - CPU#15 stuck for 23s! [etcd:1613]
[3100293.997839] NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
[3100308.078097] watchdog: BUG: soft lockup - CPU#14 stuck for 48s! [rasdaemon:2267297]
[3100312.062164] watchdog: BUG: soft lockup - CPU#8 stuck for 52s! [kworker/8:0:2191850]
[3100312.066164] watchdog: BUG: soft lockup - CPU#9 stuck for 48s! [migration/9:62]
[3100312.078164] watchdog: BUG: soft lockup - CPU#13 stuck for 48s! [migration/13:82]
[3100316.046230] watchdog: BUG: soft lockup - CPU#4 stuck for 49s! [etcd:1614]
[3100319.950295] watchdog: BUG: soft lockup - CPU#0 stuck for 48s! [etcd:556385]
[3100320.038297] watchdog: BUG: soft lockup - CPU#1 stuck for 49s! [migration/1:21]
[3100320.082297] watchdog: BUG: soft lockup - CPU#15 stuck for 49s! [etcd:1613]

These systems are >10 years old. The CPUs are Intel Xeon E5620 or similar. Kernel
is 6.1.90. An old NAS running with an Atom CPU D525 and kernel 6.7.5 is affected
as well.

The problem is quite reproducible (esp on the NAS), but it requires a little bit
of runtime. Immediately after startup the reboot works as expected. After a
few days it doesn't.


Every helpful comment is highly appreciated

Harri

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ