[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACKFLik4bdefMtUe_CjKcQuekadPj+kqExUnNrim2qiyyhG-QQ@mail.gmail.com>
Date: Fri, 17 Jan 2025 09:44:12 -0800
From: Michael Chan <michael.chan@...adcom.com>
To: Breno Leitao <leitao@...ian.org>
Cc: pavan.chebbi@...adcom.com, netdev@...r.kernel.org, kuba@...nel.org,
kernel-team@...a.com
Subject: Re: bnxt_en: NETDEV WATCHDOG in 6.13-rc7
On Fri, Jan 17, 2025 at 4:08 AM Breno Leitao <leitao@...ian.org> wrote:
> Showing all locks held in the system:
> 7 locks held by kworker/u144:3/208:
> 4 locks held by kworker/u144:4/290:
> #0: ffff88811db39948 ((wq_completion)bnxt_pf_wq){+.+.}-{0:0}, at: process_one_work+0x1090/0x1950
> #1: ffffc9000303fda0 ((work_completion)(&bp->sp_task)){+.+.}-{0:0}, at: process_one_work+0x7eb/0x1950
> #2: ffffffff86f71208 (rtnl_mutex){+.+.}-{4:4}, at: bnxt_reset+0x30/0xa0
> #3: ffff88811e41d160 (&bp->hwrm_cmd_lock){+.+.}-{4:4}, at: __hwrm_send+0x2f6/0x28d0
Since there is TX timeout, we will call bnxt_reset() from
bnxt_sp_task() workqueue. rtnl_lock will be held and we will hold the
hwrm_cmd_lock mutex for every command we send to the firmware.
Perhaps there is a problem communicating with the firmware. This will
cause the firmware command to timeout in about a second with these
locks held. We send many commands to the firmware and this can take a
while if firmware is not responding.
> 3 locks held by kworker/u144:6/322:
> #0: ffff88810812a948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1090/0x1950
> #1: ffffc90003a4fda0 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work+0x7eb/0x1950
> #2: ffffffff86f71208 (rtnl_mutex){+.+.}-{4:4}, at: linkwatch_event+0xe/0x60
Meanwhile linkwatch is trying to get the rtnl_lock.
>
>
> Full log at https://pastebin.com/4pWmaayt
>
I will take a closer look at the full log today. Thanks.
Download attachment "smime.p7s" of type "application/pkcs7-signature" (4209 bytes)
Powered by blists - more mailing lists