[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <004298f7-ae08-428e-9b98-995fc56e55b1@linux.intel.com>
Date: Wed, 13 Aug 2025 16:43:39 -0700
From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@...ux.intel.com>
To: Lukas Wunner <lukas@...ner.de>, Bjorn Helgaas <helgaas@...nel.org>
Cc: Riana Tauro <riana.tauro@...el.com>,
Aravind Iddamsetty <aravind.iddamsetty@...ux.intel.com>,
"Sean C. Dardis" <sean.c.dardis@...el.com>,
Terry Bowman <terry.bowman@....com>, Niklas Schnelle
<schnelle@...ux.ibm.com>, Linas Vepstas <linasvepstas@...il.com>,
Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
Oliver OHalloran <oohall@...il.com>,
Manivannan Sadhasivam <manivannan.sadhasivam@....qualcomm.com>,
linuxppc-dev@...ts.ozlabs.org, linux-pci@...r.kernel.org,
Shahed Shaikh <shshaikh@...vell.com>, Manish Chopra <manishc@...vell.com>,
GR-Linux-NIC-Dev@...vell.com, Nilesh Javali <njavali@...vell.com>,
GR-QLogic-Storage-Upstream@...vell.com,
"James E.J. Bottomley" <James.Bottomley@...senPartnership.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
linux-scsi@...r.kernel.org, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <"ku ba"@kernel.org>, Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org
Subject: Re: [PATCH 4/5] PCI/ERR: Update device error_state already after
reset
On 8/12/25 10:11 PM, Lukas Wunner wrote:
> After a Fatal Error has been reported by a device and has been recovered
> through a Secondary Bus Reset, AER updates the device's error_state to
> pci_channel_io_normal before invoking its driver's ->resume() callback.
>
> By contrast, EEH updates the error_state earlier, namely after resetting
> the device and before invoking its driver's ->slot_reset() callback.
> Commit c58dc575f3c8 ("powerpc/pseries: Set error_state to
> pci_channel_io_normal in eeh_report_reset()") explains in great detail
> that the earlier invocation is necessitated by various drivers checking
> accessibility of the device with pci_channel_offline() and avoiding
> accesses if it returns true. It returns true for any other error_state
> than pci_channel_io_normal.
>
> The device should be accessible already after reset, hence the reasoning
> is that it's safe to update the error_state immediately afterwards.
>
> This deviation between AER and EEH seems problematic because drivers
> behave differently depending on which error recovery mechanism the
> platform uses. Three drivers have gone so far as to update the
> error_state themselves, presumably to work around AER's behavior.
>
> For consistency, amend AER to update the error_state at the same recovery
> steps as EEH. Drop the now unnecessary workaround from the three drivers.
>
> Keep updating the error_state before ->resume() in case ->error_detected()
> or ->mmio_enabled() return PCI_ERS_RESULT_RECOVERED, which causes
> ->slot_reset() to be skipped. There are drivers doing this even for Fatal
> Errors, e.g. mhi_pci_error_detected().
>
> Signed-off-by: Lukas Wunner <lukas@...ner.de>
> ---
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 1 -
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 2 --
> drivers/pci/pcie/err.c | 3 ++-
> drivers/scsi/qla2xxx/qla_os.c | 5 -----
> 4 files changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> index d7cdea8f604d..91e7b38143ea 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> @@ -4215,7 +4215,6 @@ static pci_ers_result_t qlcnic_83xx_io_slot_reset(struct pci_dev *pdev)
> struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
> int err = 0;
>
> - pdev->error_state = pci_channel_io_normal;
> err = pci_enable_device(pdev);
> if (err)
> goto disconnect;
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> index 53cdd36c4123..e051d8c7a28d 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> @@ -3766,8 +3766,6 @@ static int qlcnic_attach_func(struct pci_dev *pdev)
> struct qlcnic_adapter *adapter = pci_get_drvdata(pdev);
> struct net_device *netdev = adapter->netdev;
>
> - pdev->error_state = pci_channel_io_normal;
> -
> err = pci_enable_device(pdev);
> if (err)
> return err;
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 930bb60fb761..bebe4bc111d7 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -153,7 +153,8 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
>
> device_lock(&dev->dev);
> pdrv = dev->driver;
> - if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
> + if (!pci_dev_set_io_state(dev, pci_channel_io_normal) ||
> + !pdrv || !pdrv->err_handler || !pdrv->err_handler->slot_reset)
> goto out;
>
> err_handler = pdrv->err_handler;
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index d4b484c0fd9d..4460421834cb 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -7883,11 +7883,6 @@ qla2xxx_pci_slot_reset(struct pci_dev *pdev)
> "Slot Reset.\n");
>
> ha->pci_error_state = QLA_PCI_SLOT_RESET;
> - /* Workaround: qla2xxx driver which access hardware earlier
> - * needs error state to be pci_channel_io_online.
> - * Otherwise mailbox command timesout.
> - */
> - pdev->error_state = pci_channel_io_normal;
>
> pci_restore_state(pdev);
>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
Powered by blists - more mailing lists