lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250117-frisky-macho-bustard-e92632@leitao>
Date: Fri, 17 Jan 2025 04:08:53 -0800
From: Breno Leitao <leitao@...ian.org>
To: michael.chan@...adcom.com, pavan.chebbi@...adcom.com
Cc: netdev@...r.kernel.org, kuba@...nel.org, kernel-team@...a.com
Subject: bnxt_en: NETDEV WATCHDOG in 6.13-rc7

Hello,

I am deploying 6.13-rc7 at commit 619f0b6fad52 ("Merge tag 'seccomp-v6.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux") 
in a machine with Broadcom BCM57452 NetXtreme-E 10Gb/25Gb/40Gb/50Gb and
the machine's network is down, with some error messages and NETDEV
WATCHDOG kicking in.

Are you guys familiar with something similar ?

Here are some of the messages. Examples:

	 bnxt_en 0000:04:00.0 eth0: NETDEV WATCHDOG: CPU: 2: transmit queue 1 timed out 5123 ms
	 bnxt_en 0000:04:00.0 eth0: TX timeout detected, starting reset task!
	 bnxt_en 0000:04:00.0 eth0: [0.0]: tx{fw_ring: 0 prod: a cons: 8}
	 bnxt_en 0000:04:00.0 eth0: [0]: rx{fw_ring: 0 prod: 1ff} rx_agg{fw_ring: 9 agg_prod: 7fc sw_agg_prod: 7fc}


Later I am getting this hung task report:

	       Tainted: G                 N 6.13.0-rc7-kbuilder-00043-g619f0b6fad52 #3
	 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	 task:swapper/0       state:D stack:0     pid:1     tgid:1     ppid:0      flags:0x00004000
	 Call Trace:
	  <TASK>
	  __schedule+0xb72/0x3690
	  ? __pfx___schedule+0x10/0x10
	  ? __pfx_lock_release+0x10/0x10
	  schedule+0xea/0x3c0
	  async_synchronize_cookie_domain+0x1b8/0x210
	  ? __pfx_async_synchronize_cookie_domain+0x10/0x10
	  ? __pfx_autoremove_wake_function+0x10/0x10
	  ? kernel_init_freeable+0x500/0x6d0
	  ? __pfx_kernel_init+0x10/0x10
	  kernel_init+0x24/0x1e0
	  ? _raw_spin_unlock_irq+0x33/0x50
	  ret_from_fork+0x31/0x70
	  ? __pfx_kernel_init+0x10/0x10
	  ret_from_fork_asm+0x1a/0x30
	  </TASK>

	 Showing all locks held in the system:
	 3 locks held by kworker/u144:0/11:
	  #0: ffff88810a1b5948 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x1090/0x1950
	  #1: ffffc9000013fda0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x1950
	  #2: ffff8881128081b0 (&dev->mutex){....}-{4:4}, at: __driver_attach_async_helper+0xa4/0x260
	 1 lock held by khungtaskd/203:
	  #0: ffffffff8669a1e0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x75/0x330
	 7 locks held by kworker/u144:3/208:
	 4 locks held by kworker/u144:4/290:
	  #0: ffff88811db39948 ((wq_completion)bnxt_pf_wq){+.+.}-{0:0}, at: process_one_work+0x1090/0x1950
	  #1: ffffc9000303fda0 ((work_completion)(&bp->sp_task)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x1950
	  #2: ffffffff86f71208 (rtnl_mutex){+.+.}-{4:4}, at: bnxt_reset+0x30/0xa0
	  #3: ffff88811e41d160 (&bp->hwrm_cmd_lock){+.+.}-{4:4}, at: __hwrm_send+0x2f6/0x28d0
	 3 locks held by kworker/u144:6/322:
	  #0: ffff88810812a948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1090/0x1950
	  #1: ffffc90003a4fda0 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x1950
	  #2: ffffffff86f71208 (rtnl_mutex){+.+.}-{4:4}, at: linkwatch_event+0xe/0x60
	 =============================================


Full log at https://pastebin.com/4pWmaayt

Thanks
--breno

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ