lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <ABB1DAF0-048A-4373-9007-988D20F359DD@oracle.com>
Date: Thu, 3 Jul 2025 18:27:46 +0000
From: Himanshu Madhani <himanshu.madhani@...cle.com>
To: "glx@...utronix.de" <glx@...utronix.de>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: System hang with latest kernel v6.16.0-rc1 (rc2 & rc3)

Hi Folks, 

We are seeing kernel hang while booting after new 6.16-rc1 kernel is installed.

Here’s stack track that shows up 

[  297.656683] systemd-shutdown[1]: Rebooting with kexec.
[  513.790993] INFO: task kexec:19038 blocked for more than 122 seconds.
[  513.868087]       Not tainted 6.16.0-rc1.master.20250611.ol9.x86_64 #1
[  513.946210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  514.039923] task:kexec           state:D stack:0     pid:19038 tgid:19038 ppid:1      task_flags:0x400100 flags:0x00004002
[  514.172122] Call Trace:
[  514.201356]  <TASK>
[  514.226438]  __schedule+0x2d1/0x730
[  514.268161]  schedule+0x27/0x80
[  514.305717]  schedule_preempt_disabled+0x15/0x30
[  514.360954]  __mutex_lock.constprop.0+0x4be/0x8a0
[  514.417232]  msi_domain_get_virq+0xcc/0x110
[  514.467279]  pci_msix_write_tph_tag+0x3c/0x100
[  514.520441]  pcie_tph_set_st_entry+0x125/0x1d0
[  514.573605]  bnxt_irq_affinity_release+0x35/0x50 [bnxt_en]
[  514.639258]  irq_set_affinity_notifier+0xdd/0x130
[  514.695534]  bnxt_free_irq+0x6e/0x110 [bnxt_en]
[  514.749746]  __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en]
[  514.815404]  bnxt_close+0x3a/0x100 [bnxt_en]
[  514.866498]  __dev_close_many+0xab/0x220
[  514.913423]  __dev_change_flags+0x102/0x240
[  514.963464]  netif_change_flags+0x26/0x70
[  515.011424]  dev_change_flags+0x40/0xc0
[  515.057304]  devinet_ioctl+0x3aa/0x7a0
[  515.102142]  inet_ioctl+0x1d3/0x1f0
[  515.143863]  sock_do_ioctl+0x7a/0x140
[  515.187667]  __x64_sys_ioctl+0x9b/0x100
[  515.233545]  ? syscall_trace_enter+0x10c/0x1d0
[  515.286704]  do_syscall_64+0x84/0x940
[  515.330502]  ? refill_obj_stock+0x143/0x240
[  515.380543]  ? __dentry_kill+0x12e/0x190
[  515.427459]  ? __memcg_slab_free_hook+0xf4/0x150
[  515.482698]  ? __x64_sys_close+0x3d/0x80
[  515.529616]  ? kmem_cache_free+0x3fe/0x460
[  515.578614]  ? syscall_exit_work+0x118/0x150
[  515.629695]  ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
[  515.698453]  ? do_syscall_64+0xba/0x940
[  515.744330]  ? mod_memcg_lruvec_state+0x1a2/0x1f0
[  515.800608]  ? __lruvec_stat_mod_folio+0x83/0xd0
[  515.855843]  ? __folio_mod_stat+0x26/0x80
[  515.903801]  ? set_ptes.isra.0+0x36/0x90
[  515.950723]  ? do_anonymous_page+0x103/0x4b0
[  516.001802]  ? __handle_mm_fault+0x394/0x6f0
[  516.052886]  ? count_memcg_events+0x15a/0x1a0
[  516.105008]  ? handle_mm_fault+0x24a/0x350
[  516.154003]  ? do_user_addr_fault+0x221/0x690
[  516.206122]  ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
[  516.274887]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  516.335330] RIP: 0033:0x7fc96e903bcb
[  516.378086] RSP: 002b:00007ffcc7f78518 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  516.468683] RAX: ffffffffffffffda RBX: 000055dc432d8f80 RCX: 00007fc96e903bcb
[  516.554080] RDX: 00007ffcc7f78680 RSI: 0000000000008914 RDI: 0000000000000003
[  516.639482] RBP: 0000000000000000 R08: 0000000000000007 R09: 0000000000000007
[  516.724882] R10: 000000000000005e R11: 0000000000000202 R12: 000055dc095468dd
[  516.810278] R13: 000055dc095468e4 R14: 00007ffcc7f78680 R15: 000055dc432d9020
[  516.895676]  </TASK>
[  516.921808] INFO: task kexec:19038 is blocked on a mutex likely owned by task kexec:19038.
[  517.020728] task:kexec           state:D stack:0     pid:19038 tgid:19038 ppid:1      task_flags:0x400100 flags:0x00004002


Git-bisect point to this merge commit 

commit 6376c0770656f3bdf7f411faf068371b6932aeca
Merge: 5e8bbb2caa4e 29857e6f4e30
Author: Linus Torvalds <torvalds@...ux-foundation.org>
Date:   Tue May 27 09:01:26 2025 -0700

    Merge tag 'timers-clocksource-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
    
    Pull clocksource updates from Thomas Gleixner:
     "Updates for clocksource/clockevent drivers:
    
       - The final conversion of text formatted device tree binding to
         schemas
    
       - A new driver fot the System Timer Module on S32G NXP SoCs
    
       - A new driver fot the Econet HPT timer
    
       - The usual improvements and device tree binding updates"
    
    * tag 'timers-clocksource-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
      clocksource/drivers/renesas-ostm: Unconditionally enable reprobe support
      dt-bindings: timer: renesas,ostm: Document RZ/V2N (R9A09G056) support
      dt-bindings: timer: Convert marvell,armada-370-timer to DT schema
      dt-bindings: timer: Convert ti,keystone-timer to DT schema
      dt-bindings: timer: Convert st,spear-timer to DT schema
      dt-bindings: timer: Convert socionext,milbeaut-timer to DT schema
      dt-bindings: timer: Convert snps,arc-timer to DT schema
      dt-bindings: timer: Convert snps,archs-rtc to DT schema
      dt-bindings: timer: Convert snps,archs-gfrc to DT schema
      dt-bindings: timer: Convert lsi,zevio-timer to DT schema
      dt-bindings: timer: Convert jcore,pit to DT schema
      dt-bindings: timer: Convert img,pistachio-gptimer to DT schema
      dt-bindings: timer: Convert ezchip,nps400-timer to DT schema
      dt-bindings: timer: Convert cirrus,clps711x-timer to DT schema
      dt-bindings: timer: Convert altr,timer-1.0 to DT schema
      dt-bindings: timer: Add ESWIN EIC7700 CLINT
      clocksource/drivers: Add EcoNet Timer HPT driver
      dt-bindings: timer: Add EcoNet EN751221 "HPT" CPU Timer
      dt-bindings: timer: Convert arm,mps2-timer to DT schema
      dt-bindings: timer: Add Sophgo SG2044 ACLINT timer
      …

Following further in this commit, I only see this following series that had changes which may or may not be related to hang. 

https://lore.kernel.org/all/20250429065337.117370076@linutronix.de/

I am not very familiar with this subsystem and was hoping if somebody can spot the offending commit and possibly provide fix for this hang. 

Note that we tried with rc3 as well to see if there was fix applied in later RC and still see same issue. 

[  525.390801] INFO: task systemd-shutdow:1 blocked for more than 122 seconds.
[  525.474133]       Tainted: G S                  6.16.0-rc3.master.20250625.ol9.x86_64 #1
[  525.570969] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  525.664681] task:systemd-shutdow state:D stack:0     pid:1     tgid:1     ppid:0      task_flags:0x400100 flags:0x00004002
[  525.796878] Call Trace:
[  525.826116]  <TASK>
[  525.851195]  __schedule+0x2d1/0x730
[  525.892917]  schedule+0x27/0x80
[  525.930478]  schedule_preempt_disabled+0x15/0x30
[  525.985718]  __mutex_lock.constprop.0+0x4be/0x8a0
[  526.041993]  msi_domain_get_virq+0xcc/0x110
[  526.092031]  pci_msix_write_tph_tag+0x3c/0x100
[  526.145186]  pcie_tph_set_st_entry+0x125/0x1d0
[  526.198346]  bnxt_irq_affinity_release+0x35/0x50 [bnxt_en]
[  526.264015]  irq_set_affinity_notifier+0xe0/0x130
[  526.320291]  bnxt_free_irq+0x6e/0x110 [bnxt_en]
[  526.374507]  __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en]
[  526.440175]  bnxt_close+0x3a/0x100 [bnxt_en]
[  526.491264]  __dev_close_many+0xae/0x220
[  526.538179]  dev_close_many+0xc2/0x1b0
[  526.583014]  netif_close+0x9d/0xd0
[  526.623693]  bnxt_shutdown+0xb1/0xe0 [bnxt_en]
[  526.676874]  pci_device_shutdown+0x35/0x70
[  526.725871]  device_shutdown+0x118/0x1a0
[  526.772788]  kernel_restart+0x3a/0x70
[  526.816588]  __do_sys_reboot+0x150/0x250
[  526.863504]  do_syscall_64+0x84/0x940
[  526.907300]  ? __put_user_8+0xd/0x20
[  526.950059]  ? rseq_ip_fixup+0x90/0x1e0
[  526.995937]  ? task_mm_cid_work+0x1ad/0x220
[  527.045971]  ? __rseq_handle_notify_resume+0x35/0x90
[  527.105367]  ? arch_exit_to_user_mode_prepare.isra.0+0x98/0xb0
[  527.175166]  ? do_syscall_64+0xba/0x940
[  527.221040]  ? do_filp_open+0xd7/0x1a0
[  527.265882]  ? alloc_fd+0xba/0x110
[  527.306556]  ? do_sys_openat2+0xa4/0xf0
[  527.352434]  ? __x64_sys_openat+0x54/0xb0
[  527.400389]  ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
[  527.469150]  ? do_syscall_64+0xba/0x940
[  527.515023]  ? do_user_addr_fault+0x221/0x690
[  527.567141]  ? clear_bhb_loop+0x30/0x80
[  527.613017]  ? clear_bhb_loop+0x30/0x80
[  527.658895]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  527.719332] RIP: 0033:0x7fc3ec504777
[  527.762091] RSP: 002b:00007ffecd62c4f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
[  527.852685] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3ec504777
[  527.938085] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[  528.023485] RBP: 00007ffecd62c700 R08: 0000000000000000 R09: 00007ffecd62b8e0
[  528.108878] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffecd62c568
[  528.194273] R13: 00007ffecd62c548 R14: 00007ffecd62c568 R15: 0000000000000000
[  528.279672]  </TASK>

-- 
Himanshu Madhani	Oracle Linux Engineering

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ