lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <f1d66585-64ee-4052-928a-bd5182ea4ca8@linux.microsoft.com>
Date: Tue, 7 May 2024 14:40:51 -0700
From: Vijay Balakrishna <vijayb@...ux.microsoft.com>
To: Linux kernel mailing list <linux-kernel@...r.kernel.org>,
 linux-arm-kernel@...ts.infradead.org
Cc: Tyler Hicks <tyhicks@...ux.microsoft.com>,
 Allen Pais <apais@...ux.microsoft.com>
Subject: Watchdog Reset on Idle CPU with a task on its runq

Hello,

We are seeing watchdog reset on ARM64 SoC running v5.10.178 kernel 
(stable) where CPU 0 running an idle task even though there is a 
runnable task on CFS runq (rcu_sched in output below).  We are wondering 
why do we see a task waiting to get scheduled to run a CPU otherwise 
running an idle task.  What does this indicate with respect to state of 
CPU 0?  What else could we check in the kernel crash dump. Any pointers 
appreciated.

Thanks,
Vijay

(crash tool output)

[530671.963762] Kernel panic - not syncing: SBSA Generic Watchdog timeout
[530671.970288] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G 
         O      5.10.178.13-microsoft-standard #1
[530671.980969] Hardware name: Overlake (DT)
[530671.984967] Call trace:
[530671.987499]  dump_backtrace+0x0/0x1f0
[530671.991238]  show_stack+0x1c/0x24
[530671.994630]  dump_stack+0xe0/0x13c
[530671.998107]  panic+0x198/0x3a4
[530672.001239]  sbsa_gwdt_set_timeout+0x0/0x7c
[530672.005498]  __handle_irq_event_percpu+0xf0/0x2ac
[530672.010277]  handle_irq_event+0x60/0x144
[530672.014275]  handle_fasteoi_irq+0x144/0x234
[530672.018533]  __handle_domain_irq+0x8c/0xcc
[530672.022704]  gic_handle_irq+0xc0/0x120
[530672.026527]  el1_irq+0xcc/0x180
[530672.029744]  cpuidle_enter_state+0x1fc/0x31c
[530672.034088]  cpuidle_enter+0x3c/0x50
[530672.037740]  do_idle+0x1e4/0x28c
[530672.041042]  cpu_startup_entry+0x28/0x2c
[530672.045042]  rest_init+0xc4/0xd0
[530672.048346]  arch_call_rest_init+0x14/0x1c
[530672.052517]  start_kernel+0x328/0x3a4
[530672.056267] SMP: stopping secondary CPUs
[530672.060450] Starting crashdump kernel...
[530672.064447] Bye!
crash> runq -c 0
CPU 0 RUNQUEUE: ffff07cf49233200
   CURRENT: PID: 0      TASK: ffffde8e444e8900  COMMAND: "swapper/0"
   RT PRIO_ARRAY: ffff07cf49233440
      [no tasks queued]
   CFS RB_ROOT: ffff07cf492332b0
      [120] PID: 11     TASK: ffff07ad40c10000  COMMAND: "rcu_sched"
crash> bt ffffde8e444e8900
PID: 0        TASK: ffffde8e444e8900  CPU: 0    COMMAND: "swapper/0"
  #0 [ffff800010003db0] __crash_kexec at ffffde8e4370b424
  #1 [ffff800010003e60] panic at ffffde8e4363b64c
  #2 [ffff800010003eb0] sbsa_gwdt_interrupt at ffffde8e43d92aa8
  #3 [ffff800010003ed0] __handle_irq_event_percpu at ffffde8e436b9720
  #4 [ffff800010003f40] handle_irq_event at ffffde8e436b99c4
  #5 [ffff800010003f70] handle_fasteoi_irq at ffffde8e436bff0c
  #6 [ffff800010003fa0] __handle_domain_irq at ffffde8e436b831c
  #7 [ffff800010003fe0] gic_handle_irq at ffffde8e43600974
--- <IRQ stack> ---
  #8 [ffffde8e444d3e50] el1_irq at ffffde8e43602288
  #9 [ffffde8e444d3e70] cpuidle_enter_state at ffffde8e43dd6190
#10 [ffffde8e444d3ed0] cpuidle_enter at ffffde8e43dd6314
#11 [ffffde8e444d3f10] do_idle at ffffde8e4368307c
#12 [ffffde8e444d3f70] cpu_startup_entry at ffffde8e4368314c
#13 [ffffde8e444d3f90] rest_init at ffffde8e4408d79c
#14 [ffffde8e444d3fb0] arch_call_rest_init at ffffde8e443b0730
#15 [ffffde8e444d3fe0] start_kernel at ffffde8e443b0a60
crash>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ