[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230818161443.708785-1-thinhtr@linux.vnet.ibm.com>
Date: Fri, 18 Aug 2023 11:14:39 -0500
From: Thinh Tran <thinhtr@...ux.vnet.ibm.com>
To: kuba@...nel.org
Cc: aelior@...vell.com, davem@...emloft.net, edumazet@...gle.com,
manishc@...vell.com, netdev@...r.kernel.org, pabeni@...hat.com,
skalluru@...vell.com, VENKATA.SAI.DUGGI@....com,
Thinh Tran <thinhtr@...ux.vnet.ibm.com>,
Abdul Haleem <abdhalee@...ibm.com>,
David Christensen <drc@...ux.vnet.ibm.com>,
Simon Horman <simon.horman@...igine.com>,
Venkata Sai Duggi <venkata.sai.duggi@....com>
Subject: [Patch v6 0/4] bnx2x: Fix error recovering in switch configuration
While injecting PCIe errors to the upstream PCIe switch of
a BCM57810 NIC, system hangs/crashes were observed.
After several calls to bnx2x_tx_timout() complete,
bnx2x_nic_unload() is called to free up HW resources
and bnx2x_napi_disable() is called to release NAPI objects.
Later, when the EEH driver calls bnx2x_io_slot_reset() to
complete the recovery process, bnx2x attempts to disable
NAPI again by calling bnx2x_napi_disable() and freeing
resources which have already been freed, resulting in a
hang or crash.
This patch set introduces a new flag to track the HW
resource and NAPI allocation state, refactor duplicated
code into a single function, check page pool allocation
status before freeing, and reduces debug output when
a TX timeout event occurs.
Signed-off-by: Thinh Tran <thinhtr@...ux.vnet.ibm.com>
Reviewed-by: Manish Chopra <manishc@...vell.com>
Tested-by: Abdul Haleem <abdhalee@...ibm.com>
Tested-by: David Christensen <drc@...ux.vnet.ibm.com>
Reviewed-by: Simon Horman <simon.horman@...igine.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@....com>
v6:
- Clarifying and updating commit messages
v5:
- Breaking down into a series of individual patches
v4:
- factoring common code into new function bnx2x_stop_nic()
that disables and releases IRQs and NAPIs
v3:
- no changes, just repatched to the latest driver level
- updated the reviewed-by Manish in October, 2022
v2:
- Check the state of the NIC before calling disable nappi
and freeing the IRQ
- Prevent recurrence of TX timeout by turning off the carrier,
calling netif_carrier_off() in bnx2x_tx_timeout()
- Check and bail out early if fp->page_pool already freed
Thinh Tran (4):
bnx2x: new the bp->nic_stopped variable for checking NIC status
bnx2x: factor out common code to bnx2x_stop_nic()
bnx2x: Prevent access to a freed page in page_pool
bnx2x: prevent excessive debug information during a TX timeout
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 ++
.../net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 33 ++++++++++++++-----
.../net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 4 +++
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 26 +++------------
.../net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c | 9 ++---
5 files changed, 37 insertions(+), 37 deletions(-)
--
2.27.0
Powered by blists - more mailing lists